Lead Site Reliability Engineer
6 hours ago
Job Title: Site Reliability Engineering (SRE) Lead Location: Hinjewadi Phase-1 (WFO) Experience :7+ years of experience Shift Time: 11:00 AM to 8:00 PM Working Days: Monday to Friday About the Role We are seeking a highly skilled and experienced SRE Lead to drive the reliability, scalability, and performance of our multi-cloud infrastructure spanning AWS and Azure. You will lead a team responsible for building and maintaining automated deployment pipelines, infrastructure as code, and observability systems using GitHub Actions, Terraform, and Datadog. As the SRE Leader, you will collaborate closely with development, operations, and security teams to ensure our services are highly available, secure, and performant, while fostering a culture of automation, monitoring, and continuous improvement. Key Responsibilities - Lead and mentor a team of SRE engineers to design, build, and maintain reliable, scalable, and secure cloud infrastructure across AWS and Azure. - Architect and implement Infrastructure as Code (IaC) solutions primarily using Terraform to manage multi-cloud environments efficiently. - Develop, maintain, and optimize CI/CD pipelines leveraging GitHub Actions to enable fast and reliable software delivery. - Establish and drive best practices in site reliability, monitoring, alerting, and incident response using Datadog and other observability tools. - Collaborate with software engineering teams to improve system reliability through automation, load testing, and performance tuning. - Define and track SLOs, SLIs, and error budgets; lead incident retrospectives and continuous improvement initiatives. - Manage cloud resource costs and optimize usage across multiple cloud providers. - Promote a DevOps culture emphasizing automation, continuous deployment, and proactive incident management. - Stay current with the latest industry trends and technologies in cloud, automation, and SRE practices. Required Skills - 7+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles. - Implement dashboards to monitor and track SLOs, SLIs, and error budgets; lead incident retrospectives and continuous improvement initiatives. - Proven experience leading and mentoring engineering teams. - Strong hands-on experience with AWS and Azure cloud platforms. - Expert in Infrastructure as Code using Terraform with multi-cloud deployments. - Proficient in building and managing CI/CD pipelines using GitHub Actions. - Deep knowledge of monitoring and observability tools, especially Datadog. - Solid understanding of networking, security, container orchestration (Kubernetes is a plus), and cloud-native architectures. - Strong scripting and automation skills (Python, Bash, or similar). - Experience with incident management, root cause analysis, and capacity planning. - Excellent communication, leadership, and collaboration skills. Technical Skills - IAC: Terraform - CICD : Git Action, Git workflow and ArgoCD - Observability: Datadog, Prometheus and Fluent bit - POD Orchestration: EKS and EKS Faregate - Cloud : AWS and Azzure Preferred - Certifications such as AWS Certified DevOps Engineer, Azure DevOps Engineer, or HashiCorp Terraform Associate. - Experience with Kubernetes and service mesh technologies. - Familiarity with chaos engineering and resilience testing. - Knowledge of security best practices in cloud environments.
-
Site Reliability Engineer
3 weeks ago
, India, IN Sonata Software Full timeWe're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...
-
Site Reliability Engineer
2 days ago
India Akamai Technologies Full timeJob Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed content delivery challenges Join our highly skilled Compute Site Reliability team Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We...
-
Noida, India BOLD Full timeJob Description BOLD is seeking professionals who will be responsible for performing the build and release activities with Microsoft Technology stack. This person will also manage CI/CD pipelines and automate the build and deployment process. He/she will also work collaboratively with different teams including Dev, QA, and infrastructure. Job Description...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, India Relanto Full timeJob Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...
-
Team Lead, Site Reliability Engineering
4 days ago
India London Stock Exchange Group Full timeCompany Profile LSEG London Stock Exchange Group is a world-leading financial markets infrastructure and data business We are dedicated open-access partners with a commitment to excellence in delivering services across Data Analytics Capital Markets and Post Trade Backed by three hundred years of experience innovative technologies and a team of over 23 000...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India, Karnataka HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 Years Job PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance. Job Responsibilities: Help build a Site Reliability...
-
Lead Site Reliability Engineer
3 weeks ago
Bengaluru, India Groupon Full timeJob Description Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few...
-
Site Reliability Engineering Manager
7 days ago
Kolkata, India CloudHire Full timeJob Description Job Summary The Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical...
-
Site Reliability Engineer
1 week ago
India Capgemini Full timeChoosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of...
-
Site Reliability Engineer
1 week ago
India Capgemini Full timeChoosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of...