
Site Reliability Engineering Specialist
3 days ago
As a Senior SRE Engineer, you will be a hands-on technical expert driving the reliability, scalability, and availability of the engineering platform. Working collaboratively across teams, you will develop and implement automated solutions, address operational challenges, and ensure the platform's robust performance. This role demands strong technical acumen, a proactive mindset, and the ability to influence platform improvements through technical excellence.
What you'll be doingPlatform Stability and Reliability
• Ensure the platform meets performance, availability, and reliability SLAs.
• Proactively identify and resolve performance bottlenecks and risks in production environments.
• Maintain and improve monitoring, logging, and alerting frameworks to detect and prevent incidents.
Incident Management
• Act as the primary responder for critical incidents, ensuring rapid mitigation and resolution.
• Conduct post-incident reviews and implement corrective actions to prevent recurrence.
• Develop and maintain detailed runbooks and playbooks for operational excellence.
Automation and Efficiency
• Build and maintain tools to automate routine tasks, such as deployments, scaling, and failover.
• Contribute to CI/CD pipeline improvements for faster and more reliable software delivery.
• Write and maintain Infrastructure as Code (IaC) using tools like Pulumi or Terraform to provision and manage resources.
Collaboration and Mentorship
• Collaborate with SRE, CI/CD, Developer Experience, and Templates teams to improve the platform's reliability and usability.
• Mentor junior engineers by sharing knowledge and best practices in SRE and operational excellence.
• Partner with developers to integrate observability and reliability into their applications.
Observability and Metrics
• Implement and optimize observability tools like Dynatrace, Prometheus, or Grafana for deep visibility into system performance.
• Define key metrics and dashboards to track the health and reliability of platform components.
• Continuously analyze operational data to identify and prioritize areas for improvement.
Required:
• 8+ years of experience in site reliability engineering, software engineering, or a related field.
• Demonstrated expertise in managing and optimizing cloud-based environments, with 3+ years of experience in AWS.
• Strong programming skills in one or more languages: Python, Java, , or TypeScript.
• Hands-on experience with containerization and orchestration technologies (e.g., Kubernetes, Docker).
• Proficiency in CI/CD practices and tools, such as GitLab, Jenkins, or similar.
• Familiarity with monitoring, logging, and alerting tools; experience with Dynatrace is a plus.
Preferred:
• Hands-on experience with Kubernetes (K8s) for container orchestration and deployment.
• Familiarity with monitoring and observability tools like Dynatrace, Prometheus, or similar.
• Exposure to agile development practices and collaborative environments.
• Experience working with other cloud platforms (e.g., Azure or Google Cloud) is a plus.
Looking in:
Leading inclusively and Safely
I inspire and build trust through self-awareness, honesty and integrity.
Owning outcomes
I take the right decisions that benefit the broader organisation.
Looking out:
Delivering for the customer
I execute brilliantly on clear priorities that add value to our customers and the wider business.
Commercially savvy
I demonstrate strong commercial focus, bringing an external perspective to decision-making.
Looking to the future:
Growth mindset
I experiment and identify opportunities for growth for both myself and the organisation.
Building for the future
I build diverse future-ready teams where all individuals can be at their best.
About us
BT Group was the world's first telco and our heritage in the sector is unrivalled. As home to several of the UK's most recognised and cherished brands – BT, EE, Openreach and Plusnet, we have always played a critical role in creating the future, and we have reached an inflection point in the transformation of our business.
Over the next two years, we will complete the UK's largest and most successful digital infrastructure project – connecting more than 25 million premises to full fibre broadband. Together with our heavy investment in 5G, we play a central role in revolutionising how people connect with each other.
While we are through the most capital-intensive phase of our fibre investment, meaning we can reward our shareholders for their commitment and patience, we are absolutely focused on how we organise ourselves in the best way to serve our customers in the years to come. This includes radical simplification of systems, structures, and processes on a huge scale. Together with our application of AI and technology, we are on a path to creating the UK's best telco, reimagining the customer experience and relationship with one of this country's biggest infrastructure companies.
Change on the scale we will all experience in the coming years is unprecedented. BT Group is committed to being the driving force behind improving connectivity for millions and there has never been a more exciting time to join a company and leadership team with the skills, experience, creativity, and passion to take this company into a new era.
A FEW POINTS TO NOTE:
Although these roles are listed as full-time, if you're a job share partnership, work reduced hours, or any other way of working flexibly, please still get in touch.
We will also offer reasonable adjustments for the selection process if required, so please do not hesitate to inform us.
DON'T MEET EVERY SINGLE REQUIREMENT?
Studies have shown that women and people who are disabled, LGBTQ+, neurodiverse or from ethnic minority backgrounds are less likely to apply for jobs unless they meet every single qualification and criteria. We're committed to building a diverse, inclusive, and authentic workplace where everyone can be their best, so if you're excited about this role but your past experience doesn't align perfectly with every requirement on the Job Description, please apply anyway - you may just be the right candidate for this or other roles in our wider team.
-
Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India Enterprise Minds, Inc Full timeWe're Hiring | Site Reliability Engineer | 8-10 years
-
Site Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India Synechron Full timeWe have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years.Synechron – BangaloreJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - BangaloreNotice Period: Within 30daysAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to 14,500+...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India Synechron Full timeWe have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Synechron Full timeWe have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to...
-
Site Reliability Engineer
4 days ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high...
-
Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Role OverviewAs a Site Reliability Engineer, you will play a pivotal role in driving innovation and modernizing complex systems by leveraging cutting-edge technologies and collaboration with cross-functional teams.
-
Site Reliability Engineer
1 day ago
Bengaluru, Karnataka, India Coforge Full timeJob Description- Design, implement, and maintain scalable infrastructure to ensure high availability and performance of software applications.- Collaborate with development teams to identify and resolve issues affecting application performance, stability, and reliability.- Develop automated monitoring scripts using tools like Prometheus, Grafana, etc. to...
-
Site Reliability Engineering
1 day ago
Bengaluru, Karnataka, India Infrasoft Technologies Limited Full timeJob DescriptionJob Title: DeveloperWork Location: Bangalore, KarnatakaExperience Range: 68 YearsJob Description:We are looking for a skilled Developer with strong hands-on experience in Site Reliability Engineering (SRE), Java, JavaScript, and Production Support. The ideal candidate should have a solid background in application monitoring and troubleshooting...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Collabera Full timeJob Description As a Principal/Chief Site Reliability Engineer , you will play a critical role in designing, developing, and maintaining scalable and highly reliable systems. You'll work closely with development teams to improve system reliability, monitor critical applications, and design fail-proof infrastructure. Responsibilities Design and implement...
-
Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India Xebia Full timeWe are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident...