Zoop.One - Site Reliability Engineer - DevOps

1 week ago


Pune, India ZOOP Full time

Role : Site Reliability Engineer.

Location : Pune (on-site).

Experience : 3+ years.

Someone who has experience setting up an in-house monitoring platform with 99.99% uptime SLA using Victoria Metrics & Prometheus in Multi Region.

Site Reliability Engineer Zoop.

The Opportunity :

We're seeking a Senior Site Reliability Engineer to elevate and standardize our reliability engineering practices. This role offers the opportunity to shape and optimize SRE practices in a high-growth fintech environment while working with cutting-edge technologies and critical identity verification services.

Key Responsibilities :

Standardization & Optimization :

- Assess and standardize existing monitoring and observability practices across NewRelic and Prometheus.

- Refine and formalize SLIs/SLOs for all solution offerings.

- Optimize current alerting strategies to improve signal-to-noise ratio.

- Document and standardize incident management processes.

- Create comprehensive runbooks for all critical services.

Reliability Engineering :

- Drive improvements to achieve and maintain 99.95% uptime for critical services.

- Optimize API response times to strengthen our "Fastest Platform" positioning.

- Implement advanced chaos engineering practices.

- Enhance existing automation and self-healing capabilities.

- Standardize disaster recovery and business continuity procedures.

Infrastructure Excellence :

- Optimize our GCP/Kubernetes infrastructureand AWS where applicablefor enhanced reliability.

- Standardize Infrastructure as Code (IaC) practices across teams.

- Identify and automate remaining manual operational tasks.

- Build advanced tooling for monitoring, deployment, and troubleshooting.

- Drive cloud cost optimization initiatives.

- Prepare for potential self?hosting scenarios, including operating Grafana, Prometheus, VictoriaMetrics, and log stacks such as Loki and Elastic.

Security & Compliance :

- Ensure all reliability practices meet ISO 27001:2022, ISO 27017:2015, ISO 27018:2019, ISO 27701:2019, and SOC 2 Type II requirements (with a pragmatic, risk?based approach).

- Enhance security monitoring and anomaly detection.

- Standardize secure CI/CD practices across the organization.

- Implement comprehensive audit and compliance reporting.

Collaboration & Process Improvement :

- Partner with the Platform team to enhance and standardize existing SRE workflows.

- Collaborate with 50+ developers to strengthen reliability culture.

- Lead blameless post?mortems and drive systematic improvements.

- Establish SRE best practices and knowledge's haring sessions.

- Build a roadmap for eventual SRE team expansion.

Technical Requirements :

Must?Have Skills :

- Experience : 3+ years in SRE, DevOps, or similar roles with a focus on standardizing and scaling practices.

- Cloud Expertise : Deep hands?on experience with Google Cloud Platform (GCP) and Amazon Web Services (AWS).

- Container Orchestration : Advanced Kubernetes and Docker skills in production environments.

- Programming : Proficiency in at least two of Go, Python, TypeScript, plus strong Shell's cripting abilities.

- Operating Systems : Expert?level Linux knowledge and tuning.

- Monitoring : Expert?level knowledge of Prometheus and NewRelic.

- IaC : Strong experience with Terraform or similar tools.

- Process Excellence : Proven track record of standardizing SRE practices.

Preferred Qualifications :

- Experience in fintech, banking, or other high's ecurity environments.

- Knowledge of ISO 27001, SOC 2, and related compliance requirements.

- Experience optimizing API reliability at scale (millions of requests/day).

- Background in maturing existing SRE practices.

- Familiarity with identity verification or fraud detection systems.

- GCP Professional Cloud Architect or DevOps Engineer certification.

- Experience running self?hosted observability stacks (Grafana, Prometheus, VictoriaMetrics, Loki, Elastic).


(ref:hirist.tech)

  • Pune, India Zoop.One Full time

    **Responsibilities**: - Interact with customers to provide and process information in response to inquiries, issues, and requests and maintain a strict SLA for replies. - Deep dive into actual issues/concerns raised by users and liaise with internal teams to find a resolution. - Be able to work across multiple channels to communicate with customers and...


  • Pune, India Ikrux Engineeering Full time

    Site Reliability Engineer (SRE)We are seeking an experienced Site Reliability Engineer (SRE) to join our engineering team. The ideal candidate should have deep expertise in DevOps practices, Linux systems, SQL, and monitoring tools like Dynatrace and Splunk, along with strong troubleshooting skills to ensure the scalability, availability, and reliability of...


  • Pune, India Zoop.One Full time

    We are looking for a skilled and passionate SDE II Backend Engineer to join our growing engineering team. You will play a key role in building and scaling our B2B KYC platform by developing and enhancing microservices, APIs, and core backend functionalities. If you enjoy solving complex problems, mentoring others, and working in fast-paced environments,...


  • Pune, Maharashtra, India ZOOP Full time

    Role : Site Reliability Engineer. Location : Pune (on-site). Experience : 3+ years. Someone who has experience setting up an in-house monitoring platform with 99.99% uptime SLA using Victoria Metrics & Prometheus in Multi Region. Site Reliability Engineer Zoop. The Opportunity : We're seeking a Senior Site Reliability Engineer to elevate and standardize our...


  • Pune, India Zoop.One Full time

    We are looking for a skilled and passionate SDE III Backend Engineer to join our growing engineering team. You will play a key role in building and scaling our B2B KYC platform by developing and enhancing microservices, APIs, and core backend functionalities. If you enjoy solving complex problems, mentoring others, and working in fast paced environments,...


  • Pune, India Zoop.One Full time

    Role :- Senior Java Backend EngineerExperience Level : 4years - 6 YearsLocation - PunePosition Overview :We are seeking a skilled Senior Java Backend Engineer with strong expertise in enterprise-level backend development. The ideal candidate will have hands-on experience with Spring Boot framework, XML data processing, and PDF manipulation and signature...


  • Pune, Maharashtra, India ENGEL Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Company DescriptionENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...


  • Pune, India ENGEL Full time

    Company Description ENGEL is a global leader in the production of injection moulding machines and their automation. The company produces systems that manufacture plastic parts used in various industries such as automotive, packaging, and consumer goods. With nine production plants worldwide and subsidiaries and representatives in over 85 countries, ENGEL...


  • Pune, India Apex One Full time

    Job Overview We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable software solutions in order to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any...


  • Pune, India Dynamisch IT Pvt ltd. Full time

    Job Title : DevOps & Site Reliability EngineerExperience : 4+ YrsQualification : B.E./ B.Tech/ M.E./M.SC IT / MCADuties and responsibilities :- Engage, Improve, develop, measure, and implement processes and tools for Continues Integrations and Delivery, Site Reliability Engineering, and automation of deployment and support of products into the cloud.-...