High Salary Lead Site Reliability Engineer

3 weeks ago


Bengaluru, Karnataka, India Landmark Group Full time
COMPANY- LANDMARK GROUP

Job Title: SRE Lead (Engineering & Reliability)

Experience: 8-12 years

Job Summary:

We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to

oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead,

you will play a pivotal role in establishing and implementing SRE practices, leading a team

of engineers, and driving automation, monitoring, and incident response strategies. This

position combines software engineering and systems engineering expertise to build and

maintain high-performing, reliable systems.

Key Responsibilities:

Reliability & Performance:

• Lead efforts to maintain high availability and reliability of critical services.

• Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.

• Proactively identify and resolve performance bottlenecks and system inefficiencies.

Incident Management & Response:

• Establish and improve incident management processes and on-call rotations.

• Lead incident response and root cause analysis for high-priority outages.

• Drive post-incident reviews and ensure actionable insights are implemented.

Automation & Tooling:

• Develop and implement automated solutions to reduce manual operational tasks.

• Enhance system observability through metrics, logging, and distributed tracing tools

(e.g., Prometheus, Grafana, Elastic APM).

• Optimize CI/CD pipelines for seamless deployments.

Collaboration:

• Partner with software engineering teams to improve the reliability of applications and

infrastructure.

• Work closely with product/ engineering teams to design scalable and robust systems.

• Ensure seamless integration of monitoring and alerting systems across teams.

Leadership & Team Building:

• Manage, mentor, and grow a team of SREs.

• Promote SRE best practices and foster a culture of reliability and performance across

the organization.

• Drive performance reviews, skills development, and career progression for team

members.

Capacity Planning & Cost Optimization:

• Perform capacity planning and implement autoscaling solutions to handle traffic

spikes.

• Optimize infrastructure and cloud costs while maintaining reliability and

performance.

Skills & Qualifications:

• Technical Expertise:

o Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.

o Hands-on knowledge of infrastructure-as-code tools like Terraform /Helm/Ansible.

o Proficiency in Java

o Expertise in distributed systems, databases, and load balancing.

• Monitoring & Observability:

o Proficient with tools like Prometheus, Grafana,, Elastic APM, or New relic.

o Understanding of metrics-driven approaches for system monitoring and alerting.

• Automation & CI/CD:

o Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc).

o Skilled in automation frameworks and tools for infrastructure and application deployments.

• Incident Management:

o Proven track record in handling incidents, post-mortems, and implementing

solutions to prevent recurrence.

Leadership & Communication Skills:

• Strong people management and leadership skills with the ability to inspire and motivate teams.

• Excellent problem-solving and decision-making skills.

• Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.

Preferred Qualifications:

• Experience with database optimization, Kafka, or other messaging systems.

• Knowledge of autoscaling techniques

• Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.

• Understanding of compliance and security best practices in distributed systems.

Why Join Us?

• Be a key driver in building and scaling reliable systems in a fast-paced environment.

• Work with cutting-edge technologies and influence the evolution of the infrastructure.

• Lead a high-impact team and foster a culture of reliability and innovation.

  • Bengaluru, Karnataka, India Xebia Full time

    Performance & Reliability Engineer ( Senior, Lead , Principal & Manager)HybridLocation: Pune, Chennai, Bangalore & GurgaonNeed immediate joiners onlyJob descriptionRole: Performance & Reliability EngineerJob Location: Gurgaon, Chennai, Pune, BangaloreHybridJob Overview:We are seeking a highly skilled and motivated Performance & Reliability Engineer to join...


  • Bengaluru, Karnataka, India Landmark Group Full time

    Job Title: SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving...


  • Bengaluru, Karnataka, India Landmark Group Full time

    Job Title: SRE Lead (Engineering & Reliability) Job Summary: We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and...


  • Bengaluru, Karnataka, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Job Role: - SRE (Senior Site Reliability Engineer)We began life in 2001 as a small, self-funded team of technology specialists. Innovative tech solutions for business We're now a leading global digital consulting firm, providing innovative technology solutions for...


  • Bengaluru, Karnataka, India CloudHire Full time

    Job SummaryThe Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical mentorship, and...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    We are looking for aL0 and L1 Site Reliability Engineer (SRE) Supportto join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered byOpenStackandKubernetes. In this role, you will focus onmonitoring,basic troubleshooting, andincident response, helping to maintain high system availability,...


  • Bengaluru, Karnataka, India WOW Softech Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Job Title: SRE Lead (Engineering & Reliability)Job Summary:We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead tooversee the reliability, scalability, and performance of our critical systems. As an SRE Lead,you will play a pivotal role in establishing and implementing SRE practices, leading a teamof engineers, and driving...


  • Bengaluru, Karnataka, India Coforge Full time

    Job Description- Design, implement, and maintain scalable infrastructure to ensure high availability and performance of software applications.- Collaborate with development teams to identify and resolve issues affecting application performance, stability, and reliability.- Develop automated monitoring scripts using tools like Prometheus, Grafana, etc. to...