Cloud Reliability Engineer

22 hours ago


Salem, Tamil Nadu, India beBeeReliability Full time ₹ 1,20,00,000 - ₹ 2,42,50,000

Are you looking for a challenging role where you can utilize your skills to drive business growth?

About the Job

This position is responsible for ensuring the availability, latency, performance, and efficiency of our cloud-based platform.

  • Create customer-centric service level indicators (SLIs) and service level objectives (SLOs) for key services and publish them quarterly.
  • Implement an error budget policy with multi-window and multi-burn-rate alerts and clear runbooks and paging thresholds.
  • Maintain SLO/EB dashboards using monitoring tools and run weekly SLO reviews with engineering/product teams.
  • Drive strategic trade-offs when budgets are at risk and lead reliability initiatives.
  • Lead high-severity incidents without drama, owning communications, running blameless postmortems, and making corrective actions stick.
  • Engineer reliability in AKS clusters at scale by hardening them, optimizing node/pod density, and ingressing traffic.
  • Oversight observability that works by implementing metrics/traces/logs with various tools.
  • Implement infrastructure as code (IaC) & policy: use Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper) to ensure no snowflakes.
  • Ensure CI/CD reliability by using Azure DevOps/GitHub Actions with canary/blue-green deployments, progressive delivery, auto-rollback, and secrets management.
  • Optimize capacity & performance through load testing, right-sizing, autoscaling, and partnering with FinOps to reduce spend without hurting SLOs.
  • Design DR capabilities that you can trust by defining RTO/RPO, testing backups/restore, running game days/chaos drills, and validating ASR and multi-region failover.
  • Secure systems by default by leveraging identity and access management solutions, managed identities, secrets rotation, VNets/NSGs/Private Link, and shift-left checks in CI.
  • Reduce toil by automating recurring operations, building self-service runbooks/chatops, and publishing golden paths for product teams.

Key Responsibilities:

  • Bachelor's degree in Computer Science or related field.
  • 12+ years of experience in production operations/platform/SRE, including 5+ years on Azure.
  • PostgreSQL: deep operational expertise incl. HA/DR, logical/physical replication, performance tuning, autovacuum strategy, partitioning, backup/restore testing, and connection pooling.
  • Azure core: AKS, Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
  • Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
  • IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
  • Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
  • Mentorship and crisp written/verbal communication.

Requirements:

  • Strong understanding of cloud computing and scalability.
  • Excellent problem-solving and analytical skills.
  • Ability to work in a fast-paced environment and adapt to changing requirements.

Preferred Qualifications:

  • Experience with Apache NiFi, Apache Flink, Apache Kafka or Redpanda.
  • Azure Solutions Architect Expert certification.
  • ITSM (ServiceNow) and on-call tooling (PagerDuty/Opsgenie).
  • Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
  • OpenTelemetry, eBPF tooling, or service mesh.


  • Salem, Tamil Nadu, India beBeeReliability Full time US$ 15,00,000 - US$ 20,00,000

    Job Overview:The Cloud Reliability Engineer Leader will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms.This role is focused on leading the operational health of these platforms, ensuring the delivery of highly reliable financial applications and data services that meet the demanding...


  • Salem, Tamil Nadu, India beBeeCloudNetwork Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Site Reliability and Network EngineerAbout this roleThis position centers on securely designing, deploying, automating, and monitoring traditional and cloud network infrastructure.Design and deploy secure network infrastructure ensuring compliance with regulatory frameworks.Implement regular network infrastructure audits and compliance checks using Ansible,...


  • Salem, Tamil Nadu, India beBeeSoftware Full time ₹ 1,80,00,000 - ₹ 2,50,00,000

    Job OverviewWe treat infrastructure and operations as software engineering problems.Our mission is to build and progress software platforms that enable the provisioning and managing of services in safe, reliable, and scalable ways.We challenge the status quo, use new technologies to build platforms and tooling for engineering teams.In this role, you will...


  • Salem, Tamil Nadu, India beBeeSre Full time ₹ 1,80,00,000 - ₹ 2,00,00,000

    Reliability Engineer LeaderJob DescriptionThis is an exciting opportunity to shape the SRE function within our organisation and be part of a founder member of the Group SRE team.We are seeking a highly skilled and experienced engineer to join our team at Natobotics. As a system reliability leader, you will define, drive, and implement the SRE strategy across...

  • Reliability Expert

    3 hours ago


    Salem, Tamil Nadu, India beBeePerformance Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job DescriptionSeeking an experienced professional to join our team as a Site Reliability Engineer.The ideal candidate will have a strong understanding of distributed systems, cloud platforms, and microservices architecture.Responsibilities include monitoring, observability, and performance optimization for web and mobile applications.This is a challenging...


  • Salem, Tamil Nadu, India beBeeSite Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    Job OpportunityWe are currently seeking a skilled Observability Engineer Site Reliability to join our team. This role will involve building and fine-tuning platform components for the Observability product, working closely with the Lead engineer, performance team, data ingestion, platform DevOps and data visualization teams under Observability product.This...

  • Cloud Data Engineer

    4 days ago


    Salem, Tamil Nadu, India beBeeAWS Full time ₹ 20,00,000 - ₹ 25,00,000

    Cloud Data Engineer PositionThis is an exciting opportunity for a seasoned Cloud Data Engineer to work with our team in Bangalore. Our organization specializes in delivering expert solutions to businesses, and we are seeking a highly skilled professional to help us achieve this goal.The ideal candidate will have experience in designing and implementing...


  • Salem, Tamil Nadu, India MindBrain Full time

    Position SITE Reliability Engineer Budget- 1.7 LPM Exp- 10 yrs Duration- 6 months Technical Skills: Programming: Proficiency in languages like Python. Operating Systems: Deep understanding of Linux/Windows operating systems and networking concepts. Cloud Technologies: Experience with Azure including services, architecture, and best practices. ...


  • Salem, Tamil Nadu, India beBeeSpecialist Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Senior Cloud Infrastructure Specialist">We're looking for a skilled Senior Cloud Infrastructure Specialist to join our Platform Engineering Practice.To succeed in this role, you'll need 7+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering, with at least 4 years of hands-on experience with ELK (Elasticsearch, Logstash,...

  • Site Reliability Engineer

    37 minutes ago


    Salem, Tamil Nadu, India beBeeSiteReliabilityEngineer Full time ₹ 1,10,00,000 - ₹ 1,70,00,000

    The Role of a Site Reliability Engineer is to ensure the stability and scalability of financial platforms.This position requires building automation, implementing monitoring, improving incident response, and championing DevOps practices to enable Finance and Accounting systems to operate with consistency and trustworthiness.Key Responsibilities...