Lead Site Reliability Engineer

13 hours ago


Bangalore, India Landmark Group Full time

Job Title: SRE Lead (Engineering & Reliability)

Job Summary:

We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.

Experience: 6+ years

Key Responsibilities:

Reliability & Performance:

  • Lead efforts to maintain high availability and reliability of critical services.
  • Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
  • Proactively identify and resolve performance bottlenecks and system inefficiencies.

Incident Management & Response:

  • Establish and improve incident management processes and on-call rotations.
  • Lead incident response and root cause analysis for high-priority outages.
  • Drive post-incident reviews and ensure actionable insights are implemented.

Automation & Tooling:

  • Develop and implement automated solutions to reduce manual operational tasks.
  • Enhance system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM).
  • Optimize CI/CD pipelines for seamless deployments.

Collaboration:

  • Partner with software engineering teams to improve the reliability of applications and infrastructure.
  • Work closely with product/ engineering teams to design scalable and robust systems.
  • Ensure seamless integration of monitoring and alerting systems across teams.

Leadership & Team Building:

  • Manage, mentor, and grow a team of SREs.
  • Promote SRE best practices and foster a culture of reliability and performance across the organization.
  • Drive performance reviews, skills development, and career progression for team members.

Capacity Planning & Cost Optimization:

  • Perform capacity planning and implement autoscaling solutions to handle traffic spikes.
  • Optimize infrastructure and cloud costs while maintaining reliability and performance.

Skills & Qualifications:

Required Skills:

  • Technical Expertise:
  • Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
  • Hands-on knowledge of infrastructure-as-code tools like Terraform /Helm/ Ansible.
  • Proficiency in Java
  • Expertise in distributed systems, databases, and load balancing.
  • Monitoring & Observability:
  • Proficient with tools like Prometheus, Grafana, Elastic APM, or new relic.
  • Understanding of metrics-driven approaches for system monitoring and alerting.
  • Automation & CI/CD:
  • Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc.).
  • Skilled in automation frameworks and tools for infrastructure and application deployments.
  • Incident Management:
  • Proven track record in handling incidents, post-mortems, and implementing solutions to prevent recurrence.

Leadership & Communication Skills:

  • Strong people management and leadership skills with the ability to inspire and motivate teams.
  • Excellent problem-solving and decision-making skills.
  • Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.

Preferred Skills:

  • Experience with database optimization, Kafka, or other messaging systems.
  • Knowledge of autoscaling techniques
  • Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
  • Understanding of compliance and security best practices in distributed systems.

Why Join Us?

  • Be a key driver in building and scaling reliable systems in a fast-paced environment.
  • Work with cutting-edge technologies and influence the evolution of the infrastructure.
  • Lead a high-impact team and foster a culture of reliability and innovation.


  • Bangalore, India ViewSonic Full time

    Job Requirements: Bachelor's degree in Computer Science, Engineering, or a related field. 3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory. Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS. Interest and understanding of...


  • Bangalore Urban, India Landmark Group Full time

    Job Title: SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...


  • Bangalore, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown...


  • Bangalore, India Landmark Group Full time

    COMPANY- LANDMARK GROUP Job Title: SRE Lead (Engineering & Reliability) Experience: 8-12 years Job Summary: We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and...


  • Bangalore, India Xebia Full time

    We are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident...


  • Bangalore, India Tavant Full time

    About Tavant: With 25+ years of experience building innovative digital products and solutions, Tavant provides impactful results to its customers. It has been the frontrunner in driving digital innovation and tech-enabled transformation across a wide range of industries such as Consumer Lending, Manufacturing, Agtech, Media & Entertainment, and Retail in...


  • Bangalore, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Job Role: - SRE (Senior Site Reliability Engineer) We began life in 2001 as a small, self-funded team of technology specialists. Innovative tech solutions for business We're now a leading global digital consulting firm, providing innovative technology solutions for...


  • Bangalore Urban, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years.Synechron – BangaloreJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - BangaloreNotice Period: Within 30daysAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+...


  • Bangalore, India Delta Air Lines Full time

    About Delta Tech Hub: Delta Air Lines (NYSE: DAL) is the U.S. global airline leader in safety, innovation, reliability and customer experience. Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-winning customer service. With our mission of...


  • Bangalore, India Rapid Circle Full time

    Making a difference and driving positive change is what we do every day at Rapid Circle. Our Cloud Pioneers help our clients in their digital transformation. Are you someone who goes for constant, positive change? Then this vacancy is for you! As a Cloud Pioneer at Rapid Circle, you will work with our customers on different projects. For example, making...