Site Reliability Engineer
6 hours ago
Company Description
About us:
Metro Global Solution Center (MGSC) is internal solution partner for METRO, a €29.8 Billion international wholesaler with operations in 32 countries through 625 stores & a team of 91,000 people globally. Metro operates in a further 10 countries with its Food Service Distribution (FSD) business and it is thus active in a total of 34 countries.
MGSC, location wise is present in Pune (India), Düsseldorf (Germany) and Szczecin (Poland). We provide HR, Finance, IT & Business operations support to 31 countries, speak 24+ languages and process over 18,000 transactions a day. We are setting tomorrow's standards for customer focus, digital solutions, and sustainable business models. For over 10 years, we have been providing services and solutions from our two locations in Pune and Szczecin. This has allowed us to gain extensive experience in how we can best serve our internal customers with high quality and passion. We believe that we can add value, drive efficiency, and satisfy our customers.
Website:
Company Size:
Headquarters: Pune, Maharashtra, India
Type: Privately Held
Inception: 2011
Job Description
Role Overview
We are seeking a Senior Site Reliability Engineer with strong experience in building and maintaining scalable, resilient systems. The ideal candidate will have hands-on expertise in cloud-native technologies, infrastructure as code, observability, and automation, with a focus on Google Cloud Platform (GCP).
Key Responsibilities
- Ensure the stability and reliability of cloud-native applications deployed on GCP, containerized with Docker and orchestrated via Kubernetes.
- Define, implement, and monitor SLOs, SLAs, and SLIs to measure system performance and user experience.
- Automate infrastructure provisioning using Terraform and manage Kubernetes configurations with Kustomize and Helm.
- Develop and maintain monitoring and alerting systems using Datadog and GCP-native tools.
- Conduct incident analysis and postmortems to drive continuous improvement.
- Collaborate with development teams to integrate reliability practices into CI/CD pipelines using GitHub Actions.
- Manage and troubleshoot database systems, particularly PostgreSQL and Cassandra.
- Apply networking knowledge and Linux system administration skills to troubleshoot and optimize system connectivity and performance.
Qualifications
Education
Bachelor's or Master's degree in Computer Science, Software Engineering, or equivalent practical experience.
Work Experience & Skills
- 5+ years of experience in Site Reliability Engineering.
- Proven experience designing and operating elastic, resilient systems in cloud environments.
- Strong understanding of GCP, Kubernetes, and container orchestration.
- Proficiency in infrastructure as code and configuration management tools (Terraform, Helm, Kustomize).
- Experience with monitoring and observability tools (Datadog, GCP Monitoring).
- Solid scripting skills in bash and familiarity with automation frameworks.
- Experience with CI/CD pipelines, especially using GitHub Actions.
- Familiarity with networking fundamentals and troubleshooting.
- Strong coding skills and ability to develop reliability-focused tooling.
- Excellent communication skills in English (written and spoken).
Other Requirements
- Strong problem-solving skills and a process-oriented mindset.
- Ability to work independently and collaboratively in a fast-paced environment.
- Passion for clean code, automation, and continuous improvement.
Nice-to-Have
- Familiarity with monitoring tools (e.g., DataDog, Prometheus, GCP Monitoring).
- Experience working in Agile/Scrum teams.
-
Vice President, Site Reliability Engineer II
6 days ago
Pune, Maharashtra, India BNY External Career Site Full timeVice President, Site Reliability EngineerAt BNY, our culture allows us to run our company better and enables employees' growth and success. As a leading global financial services company at the heart of the global financial system, we influence nearly 20% of the world's investible assets. Every day, our teams harness cutting-edge AI and breakthrough...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Relanto Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title: Site Reliability EngineerSummaryWe are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 4 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications.Roles And Responsibilities4 years of...
-
Site Reliability Engineer
1 week ago
Pune, Maharashtra, India Fiserv Full timeSite Reliability EngineerExp. Range-8 to14 YearsWhat does a successful Site Reliability Engineer (SRE) Expert do at Fiserv?The Site reliability engineer blends the principles of software engineering with the discipline of operations to create high-performing and reliable software systems. They are tasked with designing and implementing tools, processes, and...
-
BA4 Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Barclays Investment Bank Full time ₹ 8,00,000 - ₹ 16,00,000 per yearCompany DescriptionBarclays Investment Bank provides innovative financial solutions to support clients' funding, financing, strategic, and risk management needs across various sectors and global markets. With a strong presence in investment banking, international corporate banking, global markets, and research, Barclays serves money managers, financial...
-
Site Reliability Engineer
1 week ago
Pune, Maharashtra, India Digital Twin Full timeJob Description: · Intangles Lab is looking for a hands-on Site Reliability Engineer from FinTech background to manage large 24×7 Cloud Operations. · Looking for a Site Reliability Engineer with 2+ years of experience, having hands-on with the following technologies/skillset: Must-Required Skills: · AWS Cloud (Advanced): Certification is...
-
Site Reliability Engineer
2 days ago
Pune, Maharashtra, India Equifax Full timeSite Reliability Engineering (SRE)at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles.SRE is also an...
-
Site Reliable Engineer
2 weeks ago
Pune, Maharashtra, India Jefferies Financial Group Full time US$ 6,00,000 - US$ 18,00,000 per yearDescriptionPosition Title: Site Reliable Engineer (SRE) for Equity Trading PlatformJob DescriptionJefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application.As part of the wider...
-
Site Reliability Engineer
4 days ago
Pune, Maharashtra, India CrelioHealth Full timeJob Role - Site Reliability EngineerLocation - PuneJob Summary:We are seeking a Senior DevOps & SRE Engineer to join our team and help us build, deploy, and maintain our infrastructure and applications. The ideal candidate will have experience working in a fast-paced environment and a strong background in DevOps and Site Reliability Engineering (SRE). You...
-
Site Reliability Engineer
8 hours ago
Pune, Maharashtra, India NielsenIQ Full timeJob Description Senior Site Reliability Engineer, PuneAt NielsenIQ Digital Shelf, we help the world's leading brands measure and improve their online performance. Formerly known as Data Impact, we've recently joined NielsenIQ. Today, we operate at the intersection of scale and agility — a tech-driven environment backed by a global organization. Our...
-
Site Reliability Engineer
1 week ago
Pune, Maharashtra, India NR Consulting Full time```htmlAbout the CompanyWe are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Google Cloud Platform (GCP) and CI/CD automation to lead cloud infrastructure initiatives. The ideal candidate will design and implement robust CI/CD pipelines, automate deployments, ensure platform reliability, and drive continuous improvement in...