
High Salary Lead Site Reliability Engineer
1 day ago
Job Title: SRE Lead (Engineering & Reliability)
Experience: 8-12 years
Job Summary:
We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to
oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead,
you will play a pivotal role in establishing and implementing SRE practices, leading a team
of engineers, and driving automation, monitoring, and incident response strategies. This
position combines software engineering and systems engineering expertise to build and
maintain high-performing, reliable systems.
Key Responsibilities:
Reliability & Performance:
• Lead efforts to maintain high availability and reliability of critical services.
• Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
• Proactively identify and resolve performance bottlenecks and system inefficiencies.
Incident Management & Response:
• Establish and improve incident management processes and on-call rotations.
• Lead incident response and root cause analysis for high-priority outages.
• Drive post-incident reviews and ensure actionable insights are implemented.
Automation & Tooling:
• Develop and implement automated solutions to reduce manual operational tasks.
• Enhance system observability through metrics, logging, and distributed tracing tools
(e.g., Prometheus, Grafana, Elastic APM).
• Optimize CI/CD pipelines for seamless deployments.
Collaboration:
• Partner with software engineering teams to improve the reliability of applications and
infrastructure.
• Work closely with product/ engineering teams to design scalable and robust systems.
• Ensure seamless integration of monitoring and alerting systems across teams.
Leadership & Team Building:
• Manage, mentor, and grow a team of SREs.
• Promote SRE best practices and foster a culture of reliability and performance across
the organization.
• Drive performance reviews, skills development, and career progression for team
members.
Capacity Planning & Cost Optimization:
• Perform capacity planning and implement autoscaling solutions to handle traffic
spikes.
• Optimize infrastructure and cloud costs while maintaining reliability and
performance.
Skills & Qualifications:
• Technical Expertise:
o Experience with cloud platforms (AWS / Azure / GCP) and Kubernetes.
o Hands-on knowledge of infrastructure-as-code tools like Terraform /Helm/Ansible.
o Proficiency in Java
o Expertise in distributed systems, databases, and load balancing.
• Monitoring & Observability:
o Proficient with tools like Prometheus, Grafana,, Elastic APM, or New relic.
o Understanding of metrics-driven approaches for system monitoring and alerting.
• Automation & CI/CD:
o Hands-on experience with CI/CD pipelines (e.g., Jenkins, Azure Pipelines etc).
o Skilled in automation frameworks and tools for infrastructure and application deployments.
• Incident Management:
o Proven track record in handling incidents, post-mortems, and implementing
solutions to prevent recurrence.
Leadership & Communication Skills:
• Strong people management and leadership skills with the ability to inspire and motivate teams.
• Excellent problem-solving and decision-making skills.
• Clear and concise communication, with the ability to translate technical concepts for non-technical stakeholders.
Preferred Qualifications:
• Experience with database optimization, Kafka, or other messaging systems.
• Knowledge of autoscaling techniques
• Previous experience in an SRE, DevOps, or infrastructure engineering leadership role.
• Understanding of compliance and security best practices in distributed systems.
Why Join Us?
• Be a key driver in building and scaling reliable systems in a fast-paced environment.
• Work with cutting-edge technologies and influence the evolution of the infrastructure.
• Lead a high-impact team and foster a culture of reliability and innovation.
-
High Salary Site Reliability Engineering Manager
3 weeks ago
Bengaluru, Karnataka, India Apple Full timeImagine what we could do together At Apple new ideas have a way of becoming excellent products services and customer experiences very quickly Bring passion and dedication to your job and there s no telling what you could accomplish The people here at Apple don t just build products - they craft the kind of wonder that s revolutionized entire industries...
-
High Salary Site Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India BNP Paribas Full timeDear Candidate,BNP Paribas is hiring for Sire Reliability Engineer for Bangalore locationKindly apply on the below link asap if interested, we shall take your candidature ahead post the application is submitted:https://bwelcome.hr.bnpparibas/su/cba292db5cf89f02Technical & Behavioral Competencies :Mandatory skills: Site Reliability Engineer , Devops, Jenkins,...
-
Site Reliability Engineer
4 days ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high...
-
Site Reliability Engineer
23 hours ago
Bengaluru, Karnataka, India Coforge Full timeJob Description- Design, implement, and maintain scalable infrastructure to ensure high availability and performance of software applications.- Collaborate with development teams to identify and resolve issues affecting application performance, stability, and reliability.- Develop automated monitoring scripts using tools like Prometheus, Grafana, etc. to...
-
Site Reliability Engineering Director
4 days ago
Bengaluru, Karnataka, India beBeeReliability Full timePearson is looking for a dynamic and experienced Manager - Site Reliability Engineering (SRE) to join our team. This individual will play a critical role in ensuring the stability, performance, and scalability of our infrastructure. If you possess excellent leadership skills, profound technical expertise, and the ability to thrive in a fast-paced,...
-
Site Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India Synechron Full timeWe have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years.Synechron – BangaloreJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - BangaloreNotice Period: Within 30daysAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to 14,500+...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India Synechron Full timeWe have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Synechron Full timeWe have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to...
-
Senior Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India Josys Full timeSenior Site Reliability Engineer (SRE) About JOSYS : Josys, a dynamic B2B SaaS platform startup, has embarked on a mission to revolutionize IT operations globally, following an exceptional launch in Japan and securing $125 million in Series A and B funding. Our platform enables businesses to conquer the complexities of work-from-anywhere setups, rapid...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India Vestas Full timeJob DescriptionWe are seeking a skilled Site Reliability Engineer (SRE) with a strong background in data infrastructure and machine learning operations (MLOps). The ideal candidate will be responsible for designing and deploying scalable solutions for high-volume data, building robust ETL/ELT pipelines, and ensuring the reliability of our systems. You will...