Lead Site Reliability Engineer

3 weeks ago


Bengaluru, India Landmark Group Full time

Job Title:
SRE Lead (Engineering & Reliability)

Job Summary:

We are seeking an experienced and dynamic
Site Reliability Engineering (SRE) Lead
to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.

Experience:
6+ years

Key Responsibilities:

Reliability & Performance:

  • Lead efforts to maintain high availability and reliability of critical services.
  • Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
  • Proactively identify and resolve performance bottlenecks and system inefficiencies.

Incident Management & Response:

  • Establish and improve incident management processes and on-call rotations.
  • Lead incident response and root cause analysis for high-priority outages.
  • Drive post-incident reviews and ensure actionable insights are implemented.

Automation & Tooling:

  • Develop and implement automated solutions to reduce manual operational tasks.
  • Enhance system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM).
  • Optimize CI/CD pipelines for seamless deployments.

Collaboration:

  • Partner with software engineering teams to improve the reliability of applications and infrastructure.
  • Work closely with product/ engineering teams to design scalable and robust systems.
  • Ensure seamless integration of monitoring and alerting systems across teams.

Leadership & Team Building:

  • Manage, mentor, and grow a team of SREs.
  • Promote SRE best practices and foster a culture of reliability and performance across the organization.
  • Drive performance reviews, skills development, and career progression for team members.

Capacity Planning & Cost Optimization:

  • Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
  • Optimize infrastructure and cloud costs while maintaining reliability and performance.

Skills & Qualifications:

Required Skills:

  • Technical Expertise:
  • Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
  • Hands-on knowledge of infrastructure-as-code tools like Terraform /Helm/ Ansible.
  • Proficiency in Java
  • Expertise in distributed systems, databases, and load balancing.
  • Monitoring & Observability:
  • Proficient with tools like Prometheus, Grafana, Elastic APM, or new relic.
  • Understanding of metrics-driven approaches for system monitoring and alerting.
  • Automation & CI/CD:
  • Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc.).
  • Skilled in automation frameworks and tools for infrastructure and application deployments.
  • Incident Management:
  • Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.

Leadership & Communication Skills:

  • Strong people management and leadership skills with the ability to inspire and motivate teams.
  • Excellent problem-solving and decision-making skills.
  • Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.

Preferred Skills:

  • Experience with database optimization, Kafka, or other messaging systems.
  • Knowledge of autoscaling techniques
  • Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
  • Understanding of compliance and security best practices in distributed systems.

Why Join Us?

  • Be a key driver in building and scaling reliable systems in a fast-paced environment.
  • Work with cutting-edge technologies and influence the evolution of the infrastructure.
  • Lead a high-impact team and foster a culture of reliability and innovation.


  • Bengaluru, India Landmark Group Full time

    Job Description Job Title: SRE Lead (Engineering & Reliability) Job Summary: We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of...


  • Bengaluru, Karnataka, India Landmark Group Full time

    Job Title: SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...


  • Bengaluru, Karnataka, India Landmark Group Full time ₹ 8,00,000 - ₹ 12,00,000 per year

    Job Title:SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamicSite Reliability Engineering (SRE) Leadto oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...


  • Bengaluru, India HDFC Limited Full time

    Hiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...


  • Bengaluru, India HDFC Limited Full time

    Hiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore Location Experience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...


  • Bengaluru, India HDFC Limited Full time

    Hiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...


  • Bengaluru, India Nike Full time

    Who You'll Work WithSRE hired will work as an Reliability Engineer with the engineering teams. The candidate will belong to a horizontal domain called TechOps: Resilience Engineering. This position will provide a provision for the SRE to shift between multiple engineering platforms as demanded by the work, vision and/or criticality of the projects. Roles and...


  • Bengaluru, Karnataka, India Nike Full time ₹ 8,00,000 - ₹ 12,00,000 per year

    Who You'll Work WithSRE hired will work as an Reliability Engineer with the engineering teams. The candidate will belong to a horizontal domain called TechOps: Resilience Engineering. This position will provide a provision for the SRE to shift between multiple engineering platforms as demanded by the work, vision and/or criticality of the projects. Roles and...


  • Bengaluru, Karnataka, India Landmark Group Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    COMPANY- LANDMARK GROUPJob Title: SRE Lead (Engineering & Reliability)Experience: 8-12 yearsJob Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead tooversee the reliability, scalability, and performance of our critical systems. As an SRE Lead,you will play a pivotal role in establishing and implementing SRE practices,...


  • Bengaluru, India HDFC Limited Full time

    Hiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...