Site Reliability Engineer
3 weeks ago
Job Description :
We are looking for an experienced Azure Site Reliability Engineer (SRE) with 6-9 years of experience to support and administer Azure Kubernetes Service (AKS) clusters running critical middleware handling thousands of transactions per second (TPS).
The ideal candidate will have a strong background in Infrastructure as Code (IaC), cloud networking, automation, and observability to ensure high availability, scalability, and reliability.
This role requires an engineering-first mindset, focusing on IaC-driven deployments, automation, monitoring, and operational excellence while maintaining a 99.999% availability target.
Key Responsibilities :
- Deploy, manage, and maintain Azure Kubernetes Service (AKS) clusters with a focus on scalability and availability.
- Handle cluster cutovers, base image updates, and IaC-driven changes.
- Apply SRE principles to ensure high availability and resiliency. Write and maintain Terraform scripts for IaC deployments.
- Manage Kubernetes configurations using Helm charts.
- Automate infrastructure provisioning, scaling, and disaster recovery processes.
- Implement GitOps methodologies using ArgoCD for deployment automation.
- Implement and maintain monitoring & logging solutions using ELK Stack (Elasticsearch, Logstash, Kibana) or Grafana Loki.
- Analyze logs and metrics to proactively detect issues. Integrate OpenTelemetry (preferred) for distributed tracing and observability.
- Ensure compliance with security best practices for handling sensitive data in regulated environments.
- Implement and manage secrets using HashiCorp Vault (preferred).
- Maintain secure cloud networking within Azure.
- Build and optimize CI/CD pipelines using GitHub Actions (preferred) or any CI/CD tool.
- Automate deployments using GitOps principles with ArgoCD.
- Optimize build, release, and rollback processes for high availability.
- Conduct disaster recovery testing and build fault-tolerant systems.
- Respond to production incidents, troubleshoot issues, and implement long-term fixes.
- Participate in an on-call rotation to ensure system reliability.
Required Skills & Qualifications :
Cloud :
- Azure (Must-have) with strong networking expertise.
- Terraform (Hands-on experience required).
Container Orchestration :
- Kubernetes (AKS) with Helm.
- GitHub Actions (preferred) or any CI/CD tool.
- ArgoCD for deployment automation.
- Proficiency in Python or Golang (any one required).
- Experience with ELK Stack or Grafana Loki. Linux with networking skills.
- OpenTelemetry (preferred).
- HashiCorp Vault (preferred).
- Familiarity with security and compliance best practices.
- Bachelor's degree in Computer Science, Engineering, or related fields (or equivalent experience).
- 6-9 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
- Minimum 2 years of hands-on experience as an SRE working with Azure, Kubernetes, and Terraform.
- Experience working in highly available, regulated, and security-sensitive environments.
- Work with highly scalable, mission-critical systems handling thousands of transactions per second.
- Be part of a fast-paced, DevOps-driven engineering culture.
- Leverage the latest cloud-native technologies and automation frameworks.
- Competitive compensation and growth opportunities.
(ref:hirist.tech)-
Site Reliability Engineer
4 days ago
Chennai, Tamil Nadu, India Bright Vision Technologies Full timeBright Vision Technologies has an immediate Full-time opportunity for Site Reliability Engineer (SRE) Job Role: Site Reliability Engineer (SRE) Job Type: Full Time Candidates Looking for Visa sponsorship and willing to relocate to USA are encouraged to apply.About Bright Vision Technologies: Bright Vision Technologies is a fast-growing technology company...
-
Site Reliability Engineer
5 days ago
Chennai, Tamil Nadu, India 10decoders Full timeJD: Site Reliability Engineer - GCP With Terraform The Role: We are looking for a Senior SRE with 5+ years of experience to work primarily with our Application development team. An ideal candidate would have extensive experience building cloud infrastructure on Google Cloud with Terraform and have strong experience running workloads that scale on Google's...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India 10decoders Full timeJD: Site Reliability Engineer - GCP With TerraformThe Role:We are looking for a Senior SRE with 5+ years of experience to work primarily with ourApplication development team. An ideal candidate would have extensive experiencebuilding cloud infrastructure on Google Cloud with Terraform and have strongexperience running workloads that scale on Google's...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Burgeon It Services Pvt Ltd Full timeJob Title : SRE EngineerLocation : ChennaiExperience : 8+ YearsJob Description :We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in software engineering and operations, with a passion for building scalable and reliable systems.Key Responsibilities :- Design, implement,...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Burgeon It Services Pvt Ltd Full timeJob Title : SRE EngineerLocation : ChennaiExperience : 8+ YearsJob Description :We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in software engineering and operations, with a passion for building scalable and reliable systems.Key Responsibilities :- Design, implement,...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India 10decoders Full timeJob Summary We are seeking a Senior Site Reliability Engineer (SRE) with 5+ years of experience to join our team and work primarily with our Application development team. The ideal candidate will have extensive experience building cloud infrastructure on Google Cloud Platform using Terraform and strong experience running workloads that scale on Google's...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India 10decoders Full timeJD: Site Reliability Engineer -GCP With TerraformThe Role:We are looking for a Senior SRE with5+ yearsof experience to work primarily with ourApplication development team. An ideal candidate would have extensive experiencebuilding cloud infrastructure onGoogle Cloud with Terraformand have strongexperience running workloads that scale on Google's Kubernetes...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Zf Friedrich Full timeJob DescriptionJob Description :Req ID 77489|GEC Chennai, India,ZF Commercial Vehicle Control Systems India LimitedLong DescriptionAbout the Team:Garuda team is a SRE team responsible for the reliability and operations of our Fleet management services platform. We ensure the availability and performance of the platform through proactive incident management,...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Kiash Solutions LLp Full timeWe are hiring a Site Reliability Engineer (SRE) with strong expertise in Azure operations, containerized workflows (Docker), and Python scripting. The ideal candidate will lead efforts to ensure system reliability, automate operational tasks, and optimize cloud-based infrastructure, while collaborating with cross-functional teams to deliver high-performing...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India 10decoders Full timeJD: Site Reliability Engineer - GCP With TerraformThe Role:We are looking for a Senior SRE with 5+ years of experience to work primarily with ourApplication development team. An ideal candidate would have extensive experiencebuilding cloud infrastructure on Google Cloud with Terraform and have strongexperience running workloads that scale on Google's...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India ZF Group Full timeJob DescriptionJob description:About the Team:Garuda team is a SRE team responsible for the reliability and operations of our Fleet management services platform. We ensure the availability and performance of the platform through proactive incident management, optimization, and continuous improvement while contributing to the development of SCALAR&aposs...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Everstage Inc. Full timeEverstage is looking to hire Site Reliability Engineer. Please write to bharath@everstage.com if the below opportunity excites you. We are seeking a skilled and motivated Site Reliability Engineer (SRE) with at least 2 years of experience in maintaining and optimising infrastructure. The ideal candidate will be responsible for ensuring system reliability...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India 10decoders Full timeJob Description The Role: We are seeking a Senior Site Reliability Engineer with 5+ years of experience to work closely with our Application Development team. Responsibilities: Contribute to establishing best practices and shaping the SRE culture within our organization. Collaborate with teams to design, build, and improve Google Cloud infrastructure using...
-
Site Reliability Engineering Manager
3 days ago
Chennai, Tamil Nadu, India Bastion Data Solutions Full timeBecome a part of Bastion Data Solutions' mission to deliver exceptional data solutions.ResponsibilitiesThis on-site role at Bastion Data Solutions in Chennai requires a strong background in Site Reliability Engineering, software development, and system administration.Main duties will include:Ensuring site reliability and performanceDeveloping software...
-
Site Reliability Engineer
5 days ago
Chennai, Tamil Nadu, India triSys Full timeJob DescriptionExperience: 5-8yrsJob Location : Chennai/Pune/Gurgaon/KolkataWe are seeking a highly skilled and experienced Site Reliability Engineer (SRE) with a deep understanding of SRE principles and practices. This role will be instrumental in shaping and guiding the SRE journey, ensuring high availability, reliability, and performance. The ideal...
-
Senior Site Reliability Engineer
5 days ago
Chennai, Tamil Nadu, India Tredence Inc. Full timeSite Reliability Engineer (SRE) Experience: 8-12yrs Pune/ Chennai/ Gurgaon/ Kolkata We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) with a deep understanding of SRE principles and practices. This role will be instrumental in shaping and guiding the SRE journey, ensuring high availability, reliability, and performance. The...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Natobotics Technologies Pvt Limited Full timeSite Reliability Engineer - Server Support (SRE - SES)Location : Chennai, Hyderabad, Pune, BangaloreExperience : 4-7 YearsNotice Period : 0-30 DaysAbout the Role :We are urgently seeking experienced Site Reliability Engineers - Server Support (SRE - SES) to join our growing team. As an SRE - SES, you will be responsible for ensuring the high availability,...
-
Senior Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India Tredence Inc. Full timeSite Reliability Engineer (SRE) Experience: 8-12yrsPune/ Chennai/ Gurgaon/ KolkataWe are seeking a highly skilled and experienced Site Reliability Engineer (SRE) with a deep understanding of SRE principles and practices. This role will be instrumental in shaping and guiding the SRE journey, ensuring high availability, reliability, and performance. The ideal...
-
Senior Site Reliability Engineer
6 days ago
Chennai, Tamil Nadu, India Tredence Inc. Full time**Job Title:** Site Reliability Engineer (SRE) **Experience Level:** 8-12 years **Locations:** Pune, Chennai, Gurgaon, Kolkata We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to shape and guide our SRE journey. The ideal candidate will bring both technical expertise and SRE knowledge to establish robust observability, incident...
-
Site Reliability Engineering Manager
5 days ago
Chennai, Tamil Nadu, India Zuora Full timeAs a Site Reliability Engineering Manager at Zuora, you will be responsible for leading a team of talented engineers to leverage their expertise in cloud technologies, system design, troubleshooting, automation, and AI to scale and work across Product Engineering, Customer Support, Product Management, and Global Services to deliver Site and Customer...