Associate Manager SRE

3 weeks ago


Hyderabad, Telangana, India Pepsico Full time
Overview

We are seeking a self-driven, inquisitive, and curious Site Reliability Engineer (SRE) to drive reliability, availability, performance, and security across our global digital product ecosystem. This role is central to ensuring a seamless and resilient experience for our users by blending deep engineering expertise with operational excellence and automation.

You will be part of a global SRE practice supporting a portfolio of 260+ modern cloud-native applications across consumer, commercial, supply chain, and enablement functions. Your mission: prevent incidents before they occur, ensure rapid recovery when they do, and build scalable systems that evolve with our growing business.

Responsibilities

Champion reliability, observability, and operational excellence across mission-critical applications.

- Develop and maintain service-level indicators (SLIs), objectives (SLOs), and error budgets to measure and improve system performance.
- Implement automated monitoring, alerting, and recovery mechanisms to reduce manual intervention and improve response times.
- Collaborate closely with software engineering, platform, and operations teams to embed SRE practices across the development lifecycle.
- Lead and participate in incident response, root cause analysis, and postmortem reviews to drive long-term improvements.
- Identify and eliminate sources of toil through automation, tooling, and process refinement.
- Continuously improve resiliency design, capacity planning, and release management in production systems.
- Influence engineering teams with best practices on cloud-native architecture, observability, and deployment strategies.

Qualifications

Required Skills:

- 5+ years of experience in production engineering, DevOps, or SRE roles.
- Strong foundation in Linux systems, networking, and cloud platforms (Azure, AWS, or GCP).
- Hands-on experience with observability tools (e.g., AppDynamics, Prometheus, Grafana, ELK, FullStory).
- Proficiency in scripting or programming (e.g., Python, Bash, Go) and automation frameworks (e.g., Ansible, Terraform).
- Deep understanding of CI/CD pipelines, release strategies, and deployment automation.
- Experience in managing high-scale, distributed systems in cloud-native environments.
- Strong analytical skills and a passion for continuous improvement.

Preferred Skills:

- Familiarity with microservices, Kubernetes, containers, and service mesh architecture.
- Exposure to incident and problem management frameworks (e.g., ITIL, RCA practices).
- Experience working in global teams supporting mission-critical applications.

Required Skills:

- 5+ years of experience in production engineering, DevOps, or SRE roles.
- Strong foundation in Linux systems, networking, and cloud platforms (Azure, AWS, or GCP).
- Hands-on experience with observability tools (e.g., AppDynamics, Prometheus, Grafana, ELK, FullStory).
- Proficiency in scripting or programming (e.g., Python, Bash, Go) and automation frameworks (e.g., Ansible, Terraform).
- Deep understanding of CI/CD pipelines, release strategies, and deployment automation.
- Experience in managing high-scale, distributed systems in cloud-native environments.
- Strong analytical skills and a passion for continuous improvement.

Preferred Skills:

- Familiarity with microservices, Kubernetes, containers, and service mesh architecture.
- Exposure to incident and problem management frameworks (e.g., ITIL, RCA practices).
- Experience working in global teams supporting mission-critical applications.

Champion reliability, observability, and operational excellence across mission-critical applications.

- Develop and maintain service-level indicators (SLIs), objectives (SLOs), and error budgets to measure and improve system performance.
- Implement automated monitoring, alerting, and recovery mechanisms to reduce manual intervention and improve response times.
- Collaborate closely with software engineering, platform, and operations teams to embed SRE practices across the development lifecycle.
- Lead and participate in incident response, root cause analysis, and postmortem reviews to drive long-term improvements.
- Identify and eliminate sources of toil through automation, tooling, and process refinement.
- Continuously improve resiliency design, capacity planning, and release management in production systems.
- Influence engineering teams with best practices on cloud-native architecture, observability, and deployment strategies.
  • Associate Manager SRE

    3 weeks ago


    Hyderabad, Telangana, India PepsiCo Full time

    Job DescriptionOverviewWe are seeking a self-driven, inquisitive, and curious Site Reliability Engineer (SRE) to drive reliability, availability, performance, and security across our global digital product ecosystem. This role is central to ensuring a seamless and resilient experience for our users by blending deep engineering expertise with operational...

  • Manager SRE

    2 weeks ago


    Hyderabad, Telangana, India PepsiCo Full time

    Job DescriptionOverviewManager SRE for the Cloud automation and SRE analystResponsibilities- Candidate must have experience of 7-9 Years- Engineer should be having hands on experience on development.- Either Ansible and Terraform experience is required.- Python, powershell experience is preferred.- Engineer should develop automation scripts for the Cloud...

  • SRE Design

    1 week ago


    Hyderabad, Telangana, India Pepsico Full time

    OverviewWe are looking for a self-driven, software engineering mindset SRE engineer to- Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes- Apply pre-emptive approach into production minimizing business impact,...

  • Sre Lead

    3 weeks ago


    Hyderabad, Telangana, India People Prime Worldwide Full time

    About Client One of our MNC clients offers technology consulting and digital solutions to global enterprises across industries enabling transformative scale at unparalleled speed With 145 000 professionals across 90 countries helping 1100 clients it provides a full spectrum of services including consulting information technology enterprise...


  • Hyderabad, Telangana, India PepsiCo Full time

    Overview Manager SRE for the Cloud automation and SRE analyst Responsibilities Candidate must have experience of 7-9 Years Engineer should be having hands on experience on development Either Ansible and Terraform experience is required Python powershell experience is preferred Engineer should develop automation scripts for the Cloud team Maintain...

  • SRE Lead Design

    1 week ago


    Hyderabad, Telangana, India Pepsico Full time

    OverviewWe are looking for a self-driven, software engineering mindset SRE engineer to- Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes- Apply pre-emptive approach into production minimizing business impact,...

  • SRE Lead Design

    1 week ago


    Hyderabad, Telangana, India PepsiCo Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    OverviewWe are looking for a self-driven, software engineering mindset SRE engineer toDrive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomesApply pre-emptive approach into production minimizing business impact, via...


  • Hyderabad, Telangana, India Talent Worx Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    A leading Digital transformation company is looking for .net/SRE Engineer as below.Experience 5- 14 years in .net devlopment and SRE(site relability enginner) related roles. Bachelor's degree in Computer Science, Information Technology, or similar Proven experience (2-years+) in a Platform Engineering, Site Reliability Engineering or Software Engineering...


  • Hyderabad, Telangana, India Pepsico Full time

    OverviewWe are seeking a highly motivated and experienced Manager of Site Reliability Engineering (SRE) to lead our Azure-focused SRE team. The ideal candidate will combine technical expertise in Azure cloud services with strong leadership skills to ensure the reliability, scalability, and performance of our applications and infrastructure. As a manager, you...


  • Hyderabad, Telangana, India Talent500 Full time

    About Talent500 INC Talent500 helps companies hire build and manage global teams We are trusted by the worlds leading companies - from Fortune 500s to fast-growth startups - to help them build and run their high-impact remote teams Today Talent500 is the fastest growing remote team builder in the world Our suite of proprietary AI-enabled tools and...