
Cloud Reliability Engineer
22 hours ago
Are you looking for a challenging role where you can utilize your skills to drive business growth?
About the JobThis position is responsible for ensuring the availability, latency, performance, and efficiency of our cloud-based platform.
- Create customer-centric service level indicators (SLIs) and service level objectives (SLOs) for key services and publish them quarterly.
- Implement an error budget policy with multi-window and multi-burn-rate alerts and clear runbooks and paging thresholds.
- Maintain SLO/EB dashboards using monitoring tools and run weekly SLO reviews with engineering/product teams.
- Drive strategic trade-offs when budgets are at risk and lead reliability initiatives.
- Lead high-severity incidents without drama, owning communications, running blameless postmortems, and making corrective actions stick.
- Engineer reliability in AKS clusters at scale by hardening them, optimizing node/pod density, and ingressing traffic.
- Oversight observability that works by implementing metrics/traces/logs with various tools.
- Implement infrastructure as code (IaC) & policy: use Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper) to ensure no snowflakes.
- Ensure CI/CD reliability by using Azure DevOps/GitHub Actions with canary/blue-green deployments, progressive delivery, auto-rollback, and secrets management.
- Optimize capacity & performance through load testing, right-sizing, autoscaling, and partnering with FinOps to reduce spend without hurting SLOs.
- Design DR capabilities that you can trust by defining RTO/RPO, testing backups/restore, running game days/chaos drills, and validating ASR and multi-region failover.
- Secure systems by default by leveraging identity and access management solutions, managed identities, secrets rotation, VNets/NSGs/Private Link, and shift-left checks in CI.
- Reduce toil by automating recurring operations, building self-service runbooks/chatops, and publishing golden paths for product teams.
Key Responsibilities:
- Bachelor's degree in Computer Science or related field.
- 12+ years of experience in production operations/platform/SRE, including 5+ years on Azure.
- PostgreSQL: deep operational expertise incl. HA/DR, logical/physical replication, performance tuning, autovacuum strategy, partitioning, backup/restore testing, and connection pooling.
- Azure core: AKS, Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
- Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
- IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
- Mentorship and crisp written/verbal communication.
Requirements:
- Strong understanding of cloud computing and scalability.
- Excellent problem-solving and analytical skills.
- Ability to work in a fast-paced environment and adapt to changing requirements.
Preferred Qualifications:
- Experience with Apache NiFi, Apache Flink, Apache Kafka or Redpanda.
- Azure Solutions Architect Expert certification.
- ITSM (ServiceNow) and on-call tooling (PagerDuty/Opsgenie).
- Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
- OpenTelemetry, eBPF tooling, or service mesh.
-
Cloud Reliability Engineer Leader
18 minutes ago
Salem, Tamil Nadu, India beBeeReliability Full time US$ 15,00,000 - US$ 20,00,000Job Overview:The Cloud Reliability Engineer Leader will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms.This role is focused on leading the operational health of these platforms, ensuring the delivery of highly reliable financial applications and data services that meet the demanding...
-
Reliable Cloud Infrastructure Specialist
2 days ago
Salem, Tamil Nadu, India beBeeCloudNetwork Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Site Reliability and Network EngineerAbout this roleThis position centers on securely designing, deploying, automating, and monitoring traditional and cloud network infrastructure.Design and deploy secure network infrastructure ensuring compliance with regulatory frameworks.Implement regular network infrastructure audits and compliance checks using Ansible,...
-
Reliable Systems Engineer
20 hours ago
Salem, Tamil Nadu, India beBeeSoftware Full time ₹ 1,80,00,000 - ₹ 2,50,00,000Job OverviewWe treat infrastructure and operations as software engineering problems.Our mission is to build and progress software platforms that enable the provisioning and managing of services in safe, reliable, and scalable ways.We challenge the status quo, use new technologies to build platforms and tooling for engineering teams.In this role, you will...
-
Site Reliability Engineering Executive
2 days ago
Salem, Tamil Nadu, India beBeeSre Full time ₹ 1,80,00,000 - ₹ 2,00,00,000Reliability Engineer LeaderJob DescriptionThis is an exciting opportunity to shape the SRE function within our organisation and be part of a founder member of the Group SRE team.We are seeking a highly skilled and experienced engineer to join our team at Natobotics. As a system reliability leader, you will define, drive, and implement the SRE strategy across...
-
Reliability Expert
3 hours ago
Salem, Tamil Nadu, India beBeePerformance Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job DescriptionSeeking an experienced professional to join our team as a Site Reliability Engineer.The ideal candidate will have a strong understanding of distributed systems, cloud platforms, and microservices architecture.Responsibilities include monitoring, observability, and performance optimization for web and mobile applications.This is a challenging...
-
Reliable Platform Engineer
1 day ago
Salem, Tamil Nadu, India beBeeSite Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job OpportunityWe are currently seeking a skilled Observability Engineer Site Reliability to join our team. This role will involve building and fine-tuning platform components for the Observability product, working closely with the Lead engineer, performance team, data ingestion, platform DevOps and data visualization teams under Observability product.This...
-
Cloud Data Engineer
4 days ago
Salem, Tamil Nadu, India beBeeAWS Full time ₹ 20,00,000 - ₹ 25,00,000Cloud Data Engineer PositionThis is an exciting opportunity for a seasoned Cloud Data Engineer to work with our team in Bangalore. Our organization specializes in delivering expert solutions to businesses, and we are seeking a highly skilled professional to help us achieve this goal.The ideal candidate will have experience in designing and implementing...
-
Senior Site Reliability Engineer
3 weeks ago
Salem, Tamil Nadu, India MindBrain Full timePosition SITE Reliability Engineer Budget- 1.7 LPM Exp- 10 yrs Duration- 6 months Technical Skills: Programming: Proficiency in languages like Python. Operating Systems: Deep understanding of Linux/Windows operating systems and networking concepts. Cloud Technologies: Experience with Azure including services, architecture, and best practices. ...
-
Highly Experienced Cloud Engineer
3 days ago
Salem, Tamil Nadu, India beBeeSpecialist Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Senior Cloud Infrastructure Specialist">We're looking for a skilled Senior Cloud Infrastructure Specialist to join our Platform Engineering Practice.To succeed in this role, you'll need 7+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering, with at least 4 years of hands-on experience with ELK (Elasticsearch, Logstash,...
-
Site Reliability Engineer
37 minutes ago
Salem, Tamil Nadu, India beBeeSiteReliabilityEngineer Full time ₹ 1,10,00,000 - ₹ 1,70,00,000The Role of a Site Reliability Engineer is to ensure the stability and scalability of financial platforms.This position requires building automation, implementing monitoring, improving incident response, and championing DevOps practices to enable Finance and Accounting systems to operate with consistency and trustworthiness.Key Responsibilities...