
Lead Site Reliability Engineer
3 weeks ago
As the Lead Site Reliability Engineer (SRE) , you will spearhead the design and implementation of observability and reliability strategies across our ServiceNow platform and integrated third-party systems. You'll lead the charge in establishing and maturing telemetry frameworks, ensuring the visibility of golden signals-latency, traffic, errors, and saturation -to drive proactive performance and availability management.
This role is both strategic and hands-on. You will mentor other engineers, collaborate with cross-functional teams, and influence platform-wide improvements. Your work will directly enhance system resilience, user experience, and operational excellence.
Key Responsibilities:- Architect and implement telemetry and observability frameworks across ServiceNow and its ecosystem.
- Define and monitor golden signals to drive proactive SRE practices.
- Lead incident and problem management reviews , ensuring data-driven root cause analysis and continuous improvement.
- Collaborate with development, support, and infrastructure teams to implement self-healing , auto-remediation , and resiliency patterns .
- Develop and mature dashboards and real-time alerts using tools like ServiceNow Platform along with Datadog, Splunk, or Grafana .
- Drive automation for reliability checks, capacity planning, and environment health.
- Establish and promote SRE best practices , playbooks, and operational readiness standards across product teams.
- Represent SRE in architectural reviews and platform governance meetings.
- Mentor junior engineers, foster a learning culture, and ensure adoption of reliability-first principles.
- Bachelor's or Master's degree in Computer Science, Engineering, or related technical field.
- 10+ years of IT experience , with 5+ years in SRE or production engineering , and 2+ years in a lead or principal role .
- Proven experience in managing observability, telemetry, and incident response frameworks at scale.
- Deep understanding of ITIL-aligned processes (Incident, Problem, Change).
- Strong leadership and collaboration skills, with the ability to influence across engineering and business teams.
- Excellent verbal and written communication, especially in articulating technical decisions to business stakeholders.
- Strong experience with monitoring tools such as Datadog, Splunk, Prometheus, Grafana , or equivalents.
- Proficient in ServiceNow platform administration , performance tuning, and API integrations.
- Solid command over Unix/Linux internals , system performance tuning, and network troubleshooting.
- Proficient in one or more scripting languages: Python, Shell, JavaScript .
- Hands-on experience with Kubernetes , containers , and CI/CD pipelines .
- Deep understanding of HTTP/S, DNS, SSL/TLS , and other web protocols.
- Familiarity with cloud platforms (AWS, Azure, or GCP); certifications preferred .
- Experience with ServiceNow ITOM modules like Event Management, AIOps, and Discovery .
- Knowledge of AI/ML-based anomaly detection and alerting strategies.
- Experience with infrastructure-as-code using tools like Ansible, Terraform .
- Familiarity with performance profiling and diagnostics of complex applications.
- Previous success in establishing SRE teams or practices from the ground up.
-
Lead Site Reliability Engineer
3 weeks ago
Hyderabad, India JPMorgan Chase & Co. Full timeAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking Team, you will take the lead in conducting resiliency design reviews, break down complex...
-
Lead Site Reliability Engineer
3 weeks ago
Hyderabad, India JPMorgan Chase & Co. Full timeAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking Team, you will take the lead in conducting resiliency design reviews, break down complex...
-
Lead Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India JPMorgan Chase Full time ₹ 2,00,00,000 - ₹ 2,50,00,000 per yearAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, you hold a leadership role in your team, demonstrate strong knowledge across...
-
Lead Site Reliability Engineer
4 days ago
Hyderabad, Telangana, India JPMorgan Chase Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, you hold a leadership role in your team, demonstrate strong knowledge across multiple...
-
Lead Site Reliability Engineer
7 days ago
Hyderabad, Telangana, India EPAM Systems Full time ₹ 15,00,000 - ₹ 25,00,000 per yearWe are seeking a skilledLead Site Reliability Engineerto drive the stability, scalability, and reliability of our systems while improving efficiency through automation and best practices.This role calls for deep expertise in DevOps methodologies, Infrastructure as Code (IaC), and collaboration across teams to ensure optimal system...
-
Lead Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India JPMorgan Chase Full time ₹ 20,00,000 - ₹ 25,00,000 per yearAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, you hold a leadership role in your team, demonstrate strong knowledge across multiple...
-
Lead Site Reliability Engineer
4 weeks ago
Hyderabad, India Chase Bank Full timeJob Description Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, youhold a leadership role in your team, demonstrate strong knowledge...
-
Senior Lead Site Reliability Engineer
3 weeks ago
Hyderabad, India JPMorgan Chase & Co. Full timeElevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, you work with your fellow stakeholders to define non-functional requirements...
-
Senior Lead Site Reliability Engineer
3 weeks ago
Hyderabad, India JPMorgan Chase & Co. Full timeElevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, you work with your fellow stakeholders to define non-functional requirements...
-
Lead - site reliability engineer
4 weeks ago
Hyderabad, India VXI Global Solutions Full timeWe are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus , Grafana , Google Cloud Monitoring , and Open Telemetry , along with exposure to ...