
SRE Head
2 days ago
Job Title: SRE Head
Experience Level: ~10 years
Role Type: Engineering / Reliability
Role Overview:
The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of critical systems. The SRE Head partners with engineering, infrastructure, and operations teams to embed reliability, observability, and continuous improvement across all services.
Key Responsibilities:
- Lead and define the SRE strategy, operating model, and best practices across the organization.
- Establish and maintain SLIs, SLOs, and SLAs to measure and ensure service reliability and performance.
- Oversee incident management, post-incident reviews, and root cause analysis for major outages.
- Drive resilience engineering, disaster recovery, and chaos engineering initiatives.
- Collaborate with development, infrastructure, and operations teams to improve reliability and automation.
- Lead efforts to improve observability, including metrics, logging, and tracing frameworks.
- Foster a culture of proactive reliability, continuous learning, and blameless postmortems.
- Mentor and guide SRE leads and engineers, building high-performing reliability teams.
- Track and communicate reliability trends, key metrics, and risk areas to leadership.
- Evaluate and adopt emerging tools and practices to enhance platform reliability and scalability.
Required Qualifications & Experience:
- 10+ years of experience in SRE, reliability engineering, or production operations in large-scale environments.
- Proven expertise in availability management, incident response, and service continuity.
- Strong technical understanding of cloud platforms (GCP/AWS/Azure), Kubernetes, CI/CD, and automation.
- Proficiency in observability tools (e.g., Prometheus, Grafana, Dynatrace, Datadog, ELK, OpenTelemetry).
- Experience implementing SLIs/SLOs, error budgets, and capacity planning frameworks.
- Strong leadership, strategic thinking, and cross-functional collaboration skills.
- Excellent communication, mentoring, and culture-building abilities.
Desirable Skills:
- Experience in building and scaling SRE organizations or CoEs.
- Exposure to performance engineering, cost optimization, and AIOps practices.
- Deep understanding of network reliability, security resiliency, and compliance-driven uptime goals.
- Certification in reliability or cloud architecture (e.g., Google SRE, GCP Professional Architect).
-
Head Engineering
3 days ago
Mumbai, Maharashtra, India MS Forward Full time ₹ 20,00,000 - ₹ 25,00,000 per yearPosition - Head Engineering - Digital AppsReports to - CTOLocation- Mumbai - ThaneCompany- Leading Insurance companyRole OverviewWe are looking for a senior technology leader to head the Digital Applications portfolio. This individual will be responsible for ensuring continuity of mission-critical applications, driving the build-out of our...
-
Techberryinfotech - Monitoring Head - APM
7 days ago
Mumbai, Maharashtra, India Techberry Infotech Full time ₹ 20,00,000 - ₹ 25,00,000 per yearDescription : About the job : Monitoring Head (APM Monitoring ) Experience : 1012 Years Location : Ghansoli Mahape We are seeking a highly experienced and strategic Monitoring Head to lead our Application and Infrastructure Performance Management (APM) and Observability initiatives. This is a senior leadership role requiring deep technical...
-
Director_Java/Python AI Lead
4 days ago
Pune, Maharashtra, India, Maharashtra Mancer Consulting Services Full timeLooking only for Diversity CandidatesExperience - 17+ years (70% Hands - on + 30% Management)Reporting to - Managing DirectorYour Role - What You’ll DoAs a GenAI and Google Cloud Architecture and Engineer , you will be responsible for helping direct, create, review, and approve architectural designs for applications in the tribe. You are expected to be a...
-
SRE Head
4 days ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE Head Experience Level: ~10 years Role Type: Engineering / Reliability Role Overview: The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of...
-
SRE Head
2 days ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE Head Experience Level: ~10 years Role Type: Engineering / Reliability Role Overview: The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of...
-
SRE Head
2 days ago
mumbai, India SID Global Solutions Full timeJob Title: SRE Head Experience Level: ~10 years Role Type: Engineering / Reliability Role Overview: The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of...
-
SRE Head
3 days ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE Head Experience Level: ~10 years Role Type: Engineering / Reliability Role Overview: The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of...
-
SRE Head
4 days ago
mumbai, India SID Global Solutions Full timeJob Title: SRE HeadExperience Level: ~10 yearsRole Type: Engineering / ReliabilityRole Overview:The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of critical...
-
SRE Head
4 days ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE HeadExperience Level: ~10 yearsRole Type: Engineering / ReliabilityRole Overview:The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of critical...
-
SRE Head
2 days ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE HeadExperience Level: ~10 yearsRole Type: Engineering / ReliabilityRole Overview:The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of critical...
-
SRE Head
3 days ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE HeadExperience Level: ~10 yearsRole Type: Engineering / ReliabilityRole Overview:The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of critical...
-
SRE Head
3 days ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE HeadExperience Level: ~10 yearsRole Type: Engineering / ReliabilityRole Overview:The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of critical...
-
SRE Head
1 day ago
Mumbai, India SID Global Solutions Full timeJob Title: SRE HeadExperience Level: ~10 yearsRole Type: Engineering / ReliabilityRole Overview:The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of...