Lead Site Reliability Engineer

7 days ago


Bengaluru India Landmark Group Full time

Job Description

Job Title: SRE Lead (Engineering & Reliability)

Job Summary:

We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.

Experience: 6+ years

Key Responsibilities:

Reliability & Performance:

- Lead efforts to maintain high availability and reliability of critical services.
- Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
- Proactively identify and resolve performance bottlenecks and system inefficiencies.

Incident Management & Response:

- Establish and improve incident management processes and on-call rotations.
- Lead incident response and root cause analysis for high-priority outages.
- Drive post-incident reviews and ensure actionable insights are implemented.

Automation & Tooling:

- Develop and implement automated solutions to reduce manual operational tasks.
- Enhance system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM).
- Optimize CI/CD pipelines for seamless deployments.

Collaboration:

- Partner with software engineering teams to improve the reliability of applications and infrastructure.
- Work closely with product/ engineering teams to design scalable and robust systems.
- Ensure seamless integration of monitoring and alerting systems across teams.

Leadership & Team Building:

- Manage, mentor, and grow a team of SREs.
- Promote SRE best practices and foster a culture of reliability and performance across the organization.
- Drive performance reviews, skills development, and career progression for team members.

Capacity Planning & Cost Optimization:

- Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
- Optimize infrastructure and cloud costs while maintaining reliability and performance.

Skills & Qualifications:

Required Skills:

- Technical Expertise:
- Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
- Hands-on knowledge of infrastructure-as-code tools like Terraform /Helm/ Ansible.
- Proficiency in Java
- Expertise in distributed systems, databases, and load balancing.
- Monitoring & Observability:
- Proficient with tools like Prometheus, Grafana, Elastic APM, or new relic.
- Understanding of metrics-driven approaches for system monitoring and alerting.
- Automation & CI/CD:
- Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc.).
- Skilled in automation frameworks and tools for infrastructure and application deployments.
- Incident Management:
- Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.

Leadership & Communication Skills:

- Strong people management and leadership skills with the ability to inspire and motivate teams.
- Excellent problem-solving and decision-making skills.
- Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.

Preferred Skills:

- Experience with database optimization, Kafka, or other messaging systems.
- Knowledge of autoscaling techniques
- Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
- Understanding of compliance and security best practices in distributed systems.

Why Join Us

- Be a key driver in building and scaling reliable systems in a fast-paced environment.
- Work with cutting-edge technologies and influence the evolution of the infrastructure.
- Lead a high-impact team and foster a culture of reliability and innovation.



  • Hyderabad, India Chase Bank Full time

    Job Description Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, youhold a leadership role in your team, demonstrate strong knowledge...


  • Bengaluru, Karnataka, India JPMorganChase Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    JOB DESCRIPTIONAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.As a Lead Site Reliability Engineer at JPMorgan Chase within the Commercial and Investment Banking - Digital team, you hold a leadership role in your team, demonstrate...


  • india Synechron Full time

    We have immediate opportunity forSRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron –BangaloreJob Role: -SRE (Senior Site Reliability Engineer) Job Location: -Bangalore Notice Period:Within 30daysAbout Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+...


  • India Sprinto Full time

    Job DescriptionSprinto is a leading platform that automates information security compliance. By raising the bar on information security, Sprinto ensures compliance, healthy operational practices, and the ability for businesses to grow and scale with unwavering confidence. We are a team of 200+ employees & helping 1000+ Customers across 75+ Countries. We are...


  • Bengaluru, India VidPro Consultancy Services Full time

    Job Description Experience: 2.55 Years Location: Bangalore (On-site) Work Mode: 5 Days WFO Mandatory Skills: Site Reliability engineer or SRE ,Linux, System architecture, TCP/IP. HTTP,DNS ,Grafana, Prometheus and Loki Troubleshooting ,Root cause, complex systems ,Ci/CD, Docker, Kubernetes Experience : 2-4 years of relevant experience Key Skills...


  • India Employ Full time

    Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or Dev Ops Engineering roles Location – Fully Remote Type - 6 months Contract Work Ex - 5+ Yrs We’re working with a AI product company that’s building the next generation of Gen AI powered developer platforms . We’re looking for an experienced Site Reliability Engineer to join...


  • India Elgebra Full time

    Hiring: Site Reliability Engineer – 7+ Years Location: Bangalore / Chennai Payroll: Elgebra


  • India Akamai Full time

    Do you want to grow your career in Linux and Site Reliability Engineering? Would you like to contribute to the foundation of a new public cloud platform? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...


  • India Akamai Full time ₹ 5,00,000 - ₹ 15,00,000 per year

    Do you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...


  • India Concord Full time

    SRE Sr. Engineers (Individual Contributors) Key Attributes: - Strong SRE (Site Reliability Engineering) experience - DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. - Excellent troubleshooting and debugging skills (infrastructure + application level) - Perseverance – must push through complex/challenging issues without...