
Lead Site Reliability Engineer
3 weeks ago
Job Title:
SRE Lead (Engineering & Reliability)
Job Summary:
We are seeking an experienced and dynamic
Site Reliability Engineering (SRE) Lead
to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.
Experience:
6+ years
Key Responsibilities:
Reliability & Performance:
- Lead efforts to maintain high availability and reliability of critical services.
- Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
- Proactively identify and resolve performance bottlenecks and system inefficiencies.
Incident Management & Response:
- Establish and improve incident management processes and on-call rotations.
- Lead incident response and root cause analysis for high-priority outages.
- Drive post-incident reviews and ensure actionable insights are implemented.
Automation & Tooling:
- Develop and implement automated solutions to reduce manual operational tasks.
- Enhance system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM).
- Optimize CI/CD pipelines for seamless deployments.
Collaboration:
- Partner with software engineering teams to improve the reliability of applications and infrastructure.
- Work closely with product/ engineering teams to design scalable and robust systems.
- Ensure seamless integration of monitoring and alerting systems across teams.
Leadership & Team Building:
- Manage, mentor, and grow a team of SREs.
- Promote SRE best practices and foster a culture of reliability and performance across the organization.
- Drive performance reviews, skills development, and career progression for team members.
Capacity Planning & Cost Optimization:
- Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
- Optimize infrastructure and cloud costs while maintaining reliability and performance.
Skills & Qualifications:
Required Skills:
- Technical Expertise:
- Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
- Hands-on knowledge of infrastructure-as-code tools like Terraform /Helm/ Ansible.
- Proficiency in Java
- Expertise in distributed systems, databases, and load balancing.
- Monitoring & Observability:
- Proficient with tools like Prometheus, Grafana, Elastic APM, or new relic.
- Understanding of metrics-driven approaches for system monitoring and alerting.
- Automation & CI/CD:
- Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc.).
- Skilled in automation frameworks and tools for infrastructure and application deployments.
- Incident Management:
- Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.
Leadership & Communication Skills:
- Strong people management and leadership skills with the ability to inspire and motivate teams.
- Excellent problem-solving and decision-making skills.
- Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.
Preferred Skills:
- Experience with database optimization, Kafka, or other messaging systems.
- Knowledge of autoscaling techniques
- Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
- Understanding of compliance and security best practices in distributed systems.
Why Join Us?
- Be a key driver in building and scaling reliable systems in a fast-paced environment.
- Work with cutting-edge technologies and influence the evolution of the infrastructure.
- Lead a high-impact team and foster a culture of reliability and innovation.
-
Lead Site Reliability Engineer
2 weeks ago
Bengaluru, India Landmark Group Full timeJob Description Job Title: SRE Lead (Engineering & Reliability) Job Summary: We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of...
-
Lead Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India Landmark Group Full timeJob Title: SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...
-
Lead Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India Landmark Group Full time ₹ 8,00,000 - ₹ 12,00,000 per yearJob Title:SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamicSite Reliability Engineering (SRE) Leadto oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...
-
Site reliability engineer
2 days ago
Bengaluru, India HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...
-
Site Reliability Engineer
3 days ago
Bengaluru, India HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore Location Experience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...
-
Site reliability engineer
3 days ago
Bengaluru, India HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...
-
Lead Site Reliability Engineer, ITC
3 weeks ago
Bengaluru, India Nike Full timeWho You'll Work WithSRE hired will work as an Reliability Engineer with the engineering teams. The candidate will belong to a horizontal domain called TechOps: Resilience Engineering. This position will provide a provision for the SRE to shift between multiple engineering platforms as demanded by the work, vision and/or criticality of the projects. Roles and...
-
Lead Site Reliability Engineer, ITC
1 week ago
Bengaluru, Karnataka, India Nike Full time ₹ 8,00,000 - ₹ 12,00,000 per yearWho You'll Work WithSRE hired will work as an Reliability Engineer with the engineering teams. The candidate will belong to a horizontal domain called TechOps: Resilience Engineering. This position will provide a provision for the SRE to shift between multiple engineering platforms as demanded by the work, vision and/or criticality of the projects. Roles and...
-
Lead Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India Landmark Group Full time ₹ 12,00,000 - ₹ 36,00,000 per yearCOMPANY- LANDMARK GROUPJob Title: SRE Lead (Engineering & Reliability)Experience: 8-12 yearsJob Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead tooversee the reliability, scalability, and performance of our critical systems. As an SRE Lead,you will play a pivotal role in establishing and implementing SRE practices,...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, India HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...