System Reliability Expert

7 days ago


Chennai, Tamil Nadu, India beBeeSite Full time ₹ 13,40,400 - ₹ 23,10,600
Job Description

We are seeking a skilled Senior Site Reliability Engineer to join our team. This is a hybrid role that combines Site Reliability Engineering (SRE) and DevOps principles, requiring a strong background in building and maintaining scalable systems.

The ideal candidate will have expertise in designing, deploying, and maintaining services across Microsoft Azure, including App Services, Container Apps, Cosmos DB, Event Hubs, Azure Monitor, VMs, and Kubernetes Service.

They will also have experience in creating and managing networking (VNets, Subnets, NSGs) and identity/access controls (PIM, Managed Identities, Enterprise Applications, Role-based Access Control).

In addition, the candidate should be proficient in implementing infrastructure provisioning using Terraform/Bicep and ensuring cost-effective, scalable, and secure cloud environments.

Key Responsibilities
  • Cloud Infrastructure:
    • Design, deploy, and maintain services across Microsoft Azure.
    • Create and manage networking (VNets, Subnets, NSGs) and identity/access controls.
    • Implement infrastructure provisioning using Terraform/Bicep.
  • Monitoring, Observability & Incident Response:
    • Set up end-to-end observability using Prometheus, Grafana, Azure Monitor, ELK Stack, and Sentry.
    • Define and enforce standards for logging, metrics, traces, SLIs/SLOs, and error budgets.
    • Build proactive alerting systems for APIs, RabbitMQ, Databricks pipelines, and external integrations.
    • Establish on-call rotations, incident response runbooks, and lead RCAs to minimize Mean Time To Recovery.
  • CI/CD, Automation & Tooling:
    • Automate deployments and infrastructure lifecycle using GitHub Actions, Terraform modules, and CLI tools.
    • Improve CI/CD for faster, safer releases across containerized and VM-based workloads.
    • Build internal tools for diagnostics, rollback safety, and release automation.
    • Integrate resilience patterns like retries, circuit breakers, backoff strategies, and failovers.
  • DevOps & System Reliability:
    • Optimize system performance, memory usage, and availability for core services like RabbitMQ, APIs, analytics pipelines on Databricks.
    • Implement zero-downtime deployments, self-healing systems, and infrastructure audits.
    • Perform regular cost analysis, right-sizing, and tag-based budget enforcement.
  • Security & Compliance Collaboration:
    • Work with security teams to maintain infrastructure and data flow diagrams, support ISO 27001, GDPR, PDPA readiness.
    • Participate in threat modeling, define trust boundaries, and implement audit-ready infrastructure practices.
    Tech Stack
    • Cloud: Microsoft Azure
    • IaC: Terraform, Bicep
    • CI/CD: Azure DevOps, GitHub Actions
    • Monitoring & Logs: Prometheus, Grafana, Azure Monitor, ELK, Sentry
    • Queueing: RabbitMQ, Kafka
    • Languages: Node.js, Python


  • Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 1,00,00,000 - ₹ 2,00,00,000

    Job Title:Site Reliability ExpertRole Overview:We are seeking a highly skilled Site Reliability Engineer to lead reliability practices, ensure scalable systems, and collaborate with development teams to maintain highly available services.Key Responsibilities:Design, build, and operate reliable, scalable production services using infrastructure as code...

  • Reliability Expert

    1 week ago


    Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 20,00,000 - ₹ 25,00,000

    Reliability ExpertWe are seeking a skilled Reliability Engineer to join our team.Job Description:In this role, you will be responsible for translating product management reliability goals into actionable testable objectives. You will perform statistical data analysis, Accelerated Life Testing (ALT) and modeling, and risk assessment to ensure the highest...


  • Chennai, Tamil Nadu, India beBeeEngineering Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    About Site Reliability EngineeringSite Reliability Engineering (SRE) is a discipline that combines software and systems engineering to build reliable and efficient systems.In this role, you will be part of a team that aims to achieve high levels of system reliability by working closely with product development teams, Cloud Infrastructure, and other SRE...


  • Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 15,00,000 - ₹ 20,00,000

    Job OverviewWe are seeking a highly skilled Reliability Engineer to join our team. The ideal candidate will have expertise in designing and implementing reliable systems, as well as experience with Kubernetes, Containers, Cloud, and Database.The Reliability Engineer will be responsible for ensuring the availability, reliability, and performance of our...


  • Chennai, Tamil Nadu, India beBeeInfrastructure Full time ₹ 20,00,000 - ₹ 25,00,000

    Job DescriptionWe are seeking a highly skilled System Reliability Professional to join our team. This individual will be responsible for ensuring the availability, scalability, and resilience of our platform services in production.Key Responsibilities:Design and build cloud infrastructure using Infrastructure as Code (IaC) tools.Implement robust monitoring,...


  • Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Reliable Systems Engineer PositionA Reliability Engineer leads the development and operation of high-reliability systems, making key technical decisions and collaborating with teams.The reliability mission is to design, build, and maintain highly reliable systems that support business growth. By quantitatively measuring and managing system reliability,...


  • Chennai, Tamil Nadu, India beBeeEngineer Full time ₹ 18,00,000 - ₹ 24,00,000

    Job Title: Senior Site Reliability EngineerAs a seasoned engineer, you will be responsible for ensuring the reliability and performance of our applications and systems.


  • Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 18,00,000 - ₹ 25,00,000

    We are seeking an experienced Reliability Engineer to lead the development and operation of highly reliable and scalable products.This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and operating systems that meet business objectives.Key responsibilities include quantitatively...


  • Chennai, Tamil Nadu, India beBeeSystem Full time ₹ 1,75,00,000 - ₹ 2,34,50,000

    About the Role:The system reliability engineer will be responsible for ensuring the stability, scalability and operational excellence of finance platforms.This role involves leading the operational health of these platforms to deliver reliable financial applications and data services that meet accuracy, compliance and availability requirements for business...


  • Chennai, Tamil Nadu, India beBeeSre Full time ₹ 15,00,000 - ₹ 20,20,000

    Hiring a seasoned IT professional for a prominent real estate software company.Job Overview:We are seeking an experienced Senior Site Reliability Engineer to provide technical leadership and mentorship, driving the design and implementation of solutions that enhance platform reliability. You will also develop and maintain monitoring, alerting, and logging...