- Site Reliability Engineer

3 weeks ago

Pune, Maharashtra, India ZOOP Full time

Role : Site Reliability Engineer.

Location : Pune (on-site).

Experience : 3+ years.

Someone who has experience setting up an in-house monitoring platform with 99.99% uptime SLA using Victoria Metrics & Prometheus in Multi Region.

Site Reliability Engineer Zoop.

The Opportunity :

We're seeking a Senior Site Reliability Engineer to elevate and standardize our reliability engineering practices. This role offers the opportunity to shape and optimize SRE practices in a high-growth fintech environment while working with cutting-edge technologies and critical identity verification services.

Key Responsibilities :

Standardization & Optimization :

- Assess and standardize existing monitoring and observability practices across NewRelic and Prometheus.

- Refine and formalize SLIs/SLOs for all solution offerings.

- Optimize current alerting strategies to improve signal-to-noise ratio.

- Document and standardize incident management processes.

- Create comprehensive runbooks for all critical services.

Reliability Engineering :

- Drive improvements to achieve and maintain 99.95% uptime for critical services.

- Optimize API response times to strengthen our "Fastest Platform" positioning.

- Implement advanced chaos engineering practices.

- Enhance existing automation and self-healing capabilities.

- Standardize disaster recovery and business continuity procedures.

Infrastructure Excellence :

- Optimize our GCP/Kubernetes infrastructureand AWS where applicablefor enhanced reliability.

- Standardize Infrastructure as Code (IaC) practices across teams.

- Identify and automate remaining manual operational tasks.

- Build advanced tooling for monitoring, deployment, and troubleshooting.

- Drive cloud cost optimization initiatives.

- Prepare for potential self?hosting scenarios, including operating Grafana, Prometheus, VictoriaMetrics, and log stacks such as Loki and Elastic.

Security & Compliance :

- Ensure all reliability practices meet ISO 27001:2022, ISO 27017:2015, ISO 27018:2019, ISO 27701:2019, and SOC 2 Type II requirements (with a pragmatic, risk?based approach).

- Enhance security monitoring and anomaly detection.

- Standardize secure CI/CD practices across the organization.

- Implement comprehensive audit and compliance reporting.

Collaboration & Process Improvement :

- Partner with the Platform team to enhance and standardize existing SRE workflows.

- Collaborate with 50+ developers to strengthen reliability culture.

- Lead blameless post?mortems and drive systematic improvements.

- Establish SRE best practices and knowledge's haring sessions.

- Build a roadmap for eventual SRE team expansion.

Technical Requirements :

Must?Have Skills :

- Experience : 3+ years in SRE, DevOps, or similar roles with a focus on standardizing and scaling practices.

- Cloud Expertise : Deep hands?on experience with Google Cloud Platform (GCP) and Amazon Web Services (AWS).

- Container Orchestration : Advanced Kubernetes and Docker skills in production environments.

- Programming : Proficiency in at least two of Go, Python, TypeScript, plus strong Shell's cripting abilities.

- Operating Systems : Expert?level Linux knowledge and tuning.

- Monitoring : Expert?level knowledge of Prometheus and NewRelic.

- IaC : Strong experience with Terraform or similar tools.

- Process Excellence : Proven track record of standardizing SRE practices.

Preferred Qualifications :

- Experience in fintech, banking, or other high's ecurity environments.

- Knowledge of ISO 27001, SOC 2, and related compliance requirements.

- Experience optimizing API reliability at scale (millions of requests/day).

- Background in maturing existing SRE practices.

- Familiarity with identity verification or fraud detection systems.

- GCP Professional Cloud Architect or DevOps Engineer certification.

- Experience running self?hosted observability stacks (Grafana, Prometheus, VictoriaMetrics, Loki, Elastic).

(ref:hirist.tech)

Specialist - Site Reliability Engineer

7 hours ago

Pune, Maharashtra, India Accelya Group Full time ₹ 20,00,000 - ₹ 25,00,000 per year

For more than 40 years, Accelya has been the industry's partner for change, simplifying airline financial and commercial processes and empowering the air transport community to take better control of the future. Whether partnering with IATA on industry-wide initiatives or enabling digital transformation to simplify airline processes, Accelya drives the...
Specialist - Site Reliability Engineer

7 hours ago

Pune, Maharashtra, India Accelya Group Full time ₹ 15,00,000 - ₹ 25,00,000 per year

For more than 40 years, Accelya has been the industry's partner for change, simplifying airline financial and commercial processes and empowering the air transport community to take better control of the future. Whether partnering with IATA on industry-wide initiatives or enabling digital transformation to simplify airline processes, Accelya drives the...
Site Reliability Engineer

2 weeks ago

Pune, Maharashtra, India ENGEL Full time ₹ 6,00,000 - ₹ 18,00,000 per year

Company DescriptionENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...
Site Reliability Engineer

6 hours ago

Pune, Maharashtra, India Idox Full time ₹ 9,00,000 - ₹ 12,00,000 per year

Site Reliability Engineer (AWS)Pune, IndiaAbout the roleWe are seeking a driven and detail-oriented Site Reliability Engineer (SRE) with a strong passion for building resilient, scalable cloud infrastructure. This role offers an exciting opportunity for professionals with 2 to 4 years of experience in DevOps, Cloud, or Infrastructure to deepen their...
Site Reliability Engineer

3 weeks ago

Pune, Maharashtra, India Reveille Technologies Full time

Job Summary :We are seeking a skilled and proactive Site Reliability Engineer (SRE) with a strong DevOps mindset and hands-on experience in application troubleshooting. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our applications and infrastructure. This role requires a blend of software engineering,...
Site Reliability Engineer

3 weeks ago

Pune, Maharashtra, India Allianz Full time

Site Reliability Engineer (SRE) - One Identity Access ManagementThe primary objective of the Site Reliability Engineer (SRE) specializing in One Identity Access Management is to ensure the seamless operation, reliability, and scalability of IAM systems within the organization.This role is critical in maintaining system integrity, optimizing performance, and...
Site Reliability Engineer

3 weeks ago

Pune, Maharashtra, India Uplers Full time

Job DescriptionMust have skills required :Azure DevOps, SRE concepts, TerraData, CDC, CDC tool, NEWRELGood to have skills :Aws cloudwatchReflections Info Systems (One of Uplers Clients) is Looking for:Site Reliability Engineer who is passionate about their work, eager to learn and grow, and who is committed to delivering exceptional results. If you are a...
Site Reliability Engineering

10 hours ago

Pune, Maharashtra, India Deutsche Bank Full time ₹ 10,00,000 - ₹ 25,00,000 per year

Site Reliability Engineering (SRE) Lead, VPJob ID: R0402474Full/Part-Time: Full-timeRegular/Temporary: RegularListed: Location: PunePosition OverviewJob Title: Site Reliability Engineering (SRE) LeadCorporate Title: Vice PresidentLocation: Pune, IndiaRole DescriptionWe are seeking an experienced and highly capable Site Reliability Engineering (SRE) Lead to...
Site Reliability Engineer

3 weeks ago

Pune, Maharashtra, India LanceSoft, Inc Full time

Role and Responsibilities : Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments, and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric...
Site Reliability Engineer

2 weeks ago

Pune, Maharashtra, India Global Payments Inc. Full time US$ 80,000 - US$ 1,50,000 per year

Every day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1,300 financial institutions and over 600 million cardholders grow with confidence and achieve amazing...

Americas

Europe

Asia / Oceania

Africa

- Site Reliability Engineer