
System Reliability Expert
7 days ago
We are seeking a skilled Senior Site Reliability Engineer to join our team. This is a hybrid role that combines Site Reliability Engineering (SRE) and DevOps principles, requiring a strong background in building and maintaining scalable systems.
The ideal candidate will have expertise in designing, deploying, and maintaining services across Microsoft Azure, including App Services, Container Apps, Cosmos DB, Event Hubs, Azure Monitor, VMs, and Kubernetes Service.
They will also have experience in creating and managing networking (VNets, Subnets, NSGs) and identity/access controls (PIM, Managed Identities, Enterprise Applications, Role-based Access Control).
In addition, the candidate should be proficient in implementing infrastructure provisioning using Terraform/Bicep and ensuring cost-effective, scalable, and secure cloud environments.
Key Responsibilities- Cloud Infrastructure:
- Design, deploy, and maintain services across Microsoft Azure.
- Create and manage networking (VNets, Subnets, NSGs) and identity/access controls.
- Implement infrastructure provisioning using Terraform/Bicep.
- Monitoring, Observability & Incident Response:
- Set up end-to-end observability using Prometheus, Grafana, Azure Monitor, ELK Stack, and Sentry.
- Define and enforce standards for logging, metrics, traces, SLIs/SLOs, and error budgets.
- Build proactive alerting systems for APIs, RabbitMQ, Databricks pipelines, and external integrations.
- Establish on-call rotations, incident response runbooks, and lead RCAs to minimize Mean Time To Recovery.
- CI/CD, Automation & Tooling:
- Automate deployments and infrastructure lifecycle using GitHub Actions, Terraform modules, and CLI tools.
- Improve CI/CD for faster, safer releases across containerized and VM-based workloads.
- Build internal tools for diagnostics, rollback safety, and release automation.
- Integrate resilience patterns like retries, circuit breakers, backoff strategies, and failovers.
- DevOps & System Reliability:
- Optimize system performance, memory usage, and availability for core services like RabbitMQ, APIs, analytics pipelines on Databricks.
- Implement zero-downtime deployments, self-healing systems, and infrastructure audits.
- Perform regular cost analysis, right-sizing, and tag-based budget enforcement.
- Security & Compliance Collaboration:
- Work with security teams to maintain infrastructure and data flow diagrams, support ISO 27001, GDPR, PDPA readiness.
- Participate in threat modeling, define trust boundaries, and implement audit-ready infrastructure practices.
- Cloud: Microsoft Azure
- IaC: Terraform, Bicep
- CI/CD: Azure DevOps, GitHub Actions
- Monitoring & Logs: Prometheus, Grafana, Azure Monitor, ELK, Sentry
- Queueing: RabbitMQ, Kafka
- Languages: Node.js, Python
-
Expert System Reliability Specialist
2 weeks ago
Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 1,00,00,000 - ₹ 2,00,00,000Job Title:Site Reliability ExpertRole Overview:We are seeking a highly skilled Site Reliability Engineer to lead reliability practices, ensure scalable systems, and collaborate with development teams to maintain highly available services.Key Responsibilities:Design, build, and operate reliable, scalable production services using infrastructure as code...
-
Reliability Expert
1 week ago
Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 20,00,000 - ₹ 25,00,000Reliability ExpertWe are seeking a skilled Reliability Engineer to join our team.Job Description:In this role, you will be responsible for translating product management reliability goals into actionable testable objectives. You will perform statistical data analysis, Accelerated Life Testing (ALT) and modeling, and risk assessment to ensure the highest...
-
Reliable System Engineer
2 weeks ago
Chennai, Tamil Nadu, India beBeeEngineering Full time ₹ 1,50,00,000 - ₹ 2,00,00,000About Site Reliability EngineeringSite Reliability Engineering (SRE) is a discipline that combines software and systems engineering to build reliable and efficient systems.In this role, you will be part of a team that aims to achieve high levels of system reliability by working closely with product development teams, Cloud Infrastructure, and other SRE...
-
Reliability Systems Architect
2 weeks ago
Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 15,00,000 - ₹ 20,00,000Job OverviewWe are seeking a highly skilled Reliability Engineer to join our team. The ideal candidate will have expertise in designing and implementing reliable systems, as well as experience with Kubernetes, Containers, Cloud, and Database.The Reliability Engineer will be responsible for ensuring the availability, reliability, and performance of our...
-
Reliable Systems Engineer
7 days ago
Chennai, Tamil Nadu, India beBeeInfrastructure Full time ₹ 20,00,000 - ₹ 25,00,000Job DescriptionWe are seeking a highly skilled System Reliability Professional to join our team. This individual will be responsible for ensuring the availability, scalability, and resilience of our platform services in production.Key Responsibilities:Design and build cloud infrastructure using Infrastructure as Code (IaC) tools.Implement robust monitoring,...
-
Highly Reliable Systems Specialist
2 weeks ago
Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Reliable Systems Engineer PositionA Reliability Engineer leads the development and operation of high-reliability systems, making key technical decisions and collaborating with teams.The reliability mission is to design, build, and maintain highly reliable systems that support business growth. By quantitatively measuring and managing system reliability,...
-
Reliable Applications Expert
2 weeks ago
Chennai, Tamil Nadu, India beBeeEngineer Full time ₹ 18,00,000 - ₹ 24,00,000Job Title: Senior Site Reliability EngineerAs a seasoned engineer, you will be responsible for ensuring the reliability and performance of our applications and systems.
-
Senior Systems Reliability Leader
6 days ago
Chennai, Tamil Nadu, India beBeeReliability Full time ₹ 18,00,000 - ₹ 25,00,000We are seeking an experienced Reliability Engineer to lead the development and operation of highly reliable and scalable products.This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and operating systems that meet business objectives.Key responsibilities include quantitatively...
-
Senior Systems Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India beBeeSystem Full time ₹ 1,75,00,000 - ₹ 2,34,50,000About the Role:The system reliability engineer will be responsible for ensuring the stability, scalability and operational excellence of finance platforms.This role involves leading the operational health of these platforms to deliver reliable financial applications and data services that meet accuracy, compliance and availability requirements for business...
-
Reliable IT Professional Wanted
1 week ago
Chennai, Tamil Nadu, India beBeeSre Full time ₹ 15,00,000 - ₹ 20,20,000Hiring a seasoned IT professional for a prominent real estate software company.Job Overview:We are seeking an experienced Senior Site Reliability Engineer to provide technical leadership and mentorship, driving the design and implementation of solutions that enhance platform reliability. You will also develop and maintain monitoring, alerting, and logging...