
SRE Lead Design
14 hours ago
We are looking for a self-driven, software engineering mindset SRE engineer to
- Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes
- Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation,
This is a critical enabler achieving a high resiliency during operations and also continuously improving through design during the software development lifecycle.
The Lead SRE design & support engineer is integral part of the global team with its main purpose to provide a delightful customer experience for the user of the global consumer, commercial, supply chain and enablement functions in the PepsiCo digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model.
The scope of this role is focussed on the cloud architecture application full stack devlopment, B2B pepsiconnect and Direct to Customer and other S&T roadmap applications.
Ensures that PepsiCo DPA applications service performance, reliability and availability expected by our customers and internal groups
It requires a blend of technical expertise on SRE tools, modern applications cloud architecture i.e. full stack, IT operations experience, and analytics & influence skills.
Responsibilities
- Ensure ecosystem availability and performance in production environments, Pro-actively preventing P1, P2, potential P3s.
- Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications.
- Accountable to institute non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product's offerings as part of the engineering solution.
- Leads the team diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
- Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems,
- Work closely with customer-facing support teams to empower them with SRE insights and tooling.
- Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical "understanding of interactions" of a full stack application alongside with peer SRE team member.
- Continuously optimize the L2/support operations work via SRE workflow automation
- Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams.
- Actively engage and drive AI Ops adoption across teams
Qualifications
- 9-13 years of work experience evolving to a SRE engineer with 3-5 years of experience in continuously improving and transforming IT operations ways of working
- Bachelor's degree in Computer Science, Information Technology or a related field
- Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs.
- The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations
- The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives.
- A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk.
- Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets.
- A firm understanding of cloud archticture for distributed environments.
- Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js.
- Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
- Infrastructure: Azure/AWS cloud platforms and/or Client / server environments.
- Prior experience involving in shaping transformation developing SRE solutions would be a plus.
- 9-13 years of work experience evolving to a SRE engineer with 3-5 years of experience in continuously improving and transforming IT operations ways of working
- Bachelor's degree in Computer Science, Information Technology or a related field
- Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs.
- The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations
- The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives.
- A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk.
- Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets.
- A firm understanding of cloud archticture for distributed environments.
- Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js.
- Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
- Infrastructure: Azure/AWS cloud platforms and/or Client / server environments.
- Prior experience involving in shaping transformation developing SRE solutions would be a plus.
- Ensure ecosystem availability and performance in production environments, Pro-actively preventing P1, P2, potential P3s.
- Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications.
- Accountable to institute non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product's offerings as part of the engineering solution.
- Leads the team diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
- Collaborates with Engineering & support teams, including participation in escalations, , and blameless postmortems,
- Work closely with customer-facing support teams to empower them with SRE insights and tooling.
- Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical "understanding of interactions" of a full stack application alongside with peer SRE team member.
- Continuously optimize the L2/support operations work via SRE workflow automation
- Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams.
- Actively engage and drive AI Ops adoption across teams
-
Sre Lead
1 week ago
Hyderabad, Telangana, India People Prime Worldwide Full timeAbout Client One of our MNC clients offers technology consulting and digital solutions to global enterprises across industries enabling transformative scale at unparalleled speed With 145 000 professionals across 90 countries helping 1100 clients it provides a full spectrum of services including consulting information technology enterprise...
-
SRE Design
10 hours ago
Hyderabad, Telangana, India Pepsico Full timeOverviewWe are looking for a self-driven, software engineering mindset SRE engineer to- Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes- Apply pre-emptive approach into production minimizing business impact,...
-
Azure SRE Lead
2 weeks ago
Hyderabad, Telangana, India beBeeAzureSre Full time ₹ 9,00,000 - ₹ 12,00,000Job Title: Azure SRE LeadExperience: 6-8 years of experience in Azure System Reliability Engineering is required.Location: Pune or Hyderabad only.Hybrid Work Mode: This role can be performed from home or our office location.Employment Type: Full-time employment with a competitive compensation package.Mandatory Skills:Azure SREStrong understanding of cloud...
-
SRE Architect
1 week ago
Hyderabad, Telangana, India Zensar Technologies Full time ₹ 15,00,000 - ₹ 20,00,000 per yearJob Title: DevOps/SRE LeadLocation: Pune / HyderabadJob Type: FulltimeExperience: 15+ yearsJob Overview:We are seeking a highly experienced DevOps/SRE Lead with over 15 years of professional experience. The ideal candidate will possess a deep understanding of DevOps principles, extensive experience in Site Reliability Engineering (SRE), and a strong...
-
sre
1 week ago
Hyderabad, Telangana, India TechVedika Full time US$ 90,000 - US$ 1,20,000 per yearCompany DescriptionTechVedika is a technology services company specializing in AI/ML, Product Engineering, and Cloud-based solutions. Since our founding in 2010, we have been committed to providing innovative technology solutions to enterprise clients across various industries, including Manufacturing, BFSI, Healthcare, IT, Supply Chain & Logistics, Retail,...
-
Lead SRE
1 day ago
Hyderabad, Telangana, India VXI Global Solutions Llc Full time ₹ 5,00,000 - ₹ 8,00,000 per yearJob Summary: We are seeking a skilled Observability Engineer to design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You should be comfortable...
-
Senior SRE
2 weeks ago
Hyderabad, Telangana, India Insight Global Full timePosition: Sr. SRE Location: HITEC City, Hyderabad, IN (Hybrid onsite 3 days/week) Duration: Full-time Pay: $ LPA Client Summary This position is for a leading global telecommunications company known for its innovation in wireless technology and customer-centric services. As part of their digital transformation journey, they are investing...
-
Senior SRE
2 weeks ago
Hyderabad, Telangana, India Insight Global Full timePosition: Sr. SRE Location: HITEC City, Hyderabad, IN (Hybrid onsite 3 days/week) Duration: Full-time Pay: $23 - 25 LPA Client Summary This position is for a leading global telecommunications company known for its innovation in wireless technology and customer-centric services. As part of their digital transformation journey, they are investing heavily...
-
SRE Manager
1 week ago
Hyderabad, Telangana, India Ivy Full time ₹ 1,04,000 - ₹ 1,30,878 per yearCompany DescriptionEntain India is the engineering and delivery powerhouse for Entain, one of the world's leading global sports and gaming groups. Established in Hyderabad in 2001, we've grown from a small tech hub into a dynamic force, delivering cutting-edge software solutions and support services that power billions of transactions for millions of users...
-
Associate Manager SRE
2 weeks ago
Hyderabad, Telangana, India Pepsico Full timeOverviewWe are seeking a self-driven, inquisitive, and curious Site Reliability Engineer (SRE) to drive reliability, availability, performance, and security across our global digital product ecosystem. This role is central to ensuring a seamless and resilient experience for our users by blending deep engineering expertise with operational excellence and...