Manager- Site Reliability Engineering

4 days ago

Mumbai, Maharashtra, India Zycus Infotech Full time ₹ 12,00,000 - ₹ 36,00,000 per year

About Us

Zycus, recognized by leading analyst firms in procurement technology, empowers teams to unlock deep value through its comprehensive Source-to-Pay (S2P) solutions. At the heart of our S2P solution is the Merlin Agentic Platform, which orchestrates intelligent AI agents to deliver simplified, efficient, and compliant processes.

The Merlin Intake Agent offers business users unparalleled ease of use, increasing adoption rates and significantly reducing non-compliant spending. For procurement teams, the Merlin Autonomous Negotiation Agent handles tail spend autonomously, securing additional savings; the Merlin Contract Agent helps draft compliant contracts and reduces risks by actively monitoring them; and the Merlin AP Agent further enhances efficiency by automating invoice processing with exceptional speed and accuracy.We Are An Equal Opportunity Employer:

Zycus is committed to providing equal opportunities in employment and creating an inclusive work environment. We do not discriminate against applicants on the basis of race, color, religion, gender, sexual orientation, national origin, age, disability, or any other legally protected characteristic. All hiring decisions will be based solely on qualifications, skills, and experience relevant to the job requirements.

Job Description

Zycus is looking for a Site Reliability Engineer (SRE) with deep expertise in Kubernetes, automation, and Linux systems. The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on microservices architecture, ensuring automation, performance, and reliability across our SaaS platform.

Roles and Responsibilities:

System Reliability & Uptime: Ensure high availability, performance, and reliability of applications and infrastructure.
Kubernetes & Cluster Management: Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
Microservices Management: Handle the deployment, monitoring, and scaling of microservices in distributed environments.
Incident Management: Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
Automation & Infrastructure as Code (IaC): Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
Monitoring & Observability: Implement and maintain monitoring tools (e.g., Prometheus, Grafana, Datadog) to track system health and application performance.
Performance Optimization: Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.
Disaster Recovery & Backup: Design and implement backup and disaster recovery (DR) strategies for business continuity.
Capacity Planning: Forecast infrastructure needs based on performance trends and business growth to ensure scalability.
Security & Compliance: Ensure infrastructure and applications meet security standards and compliance requirements.
Collaboration with Dev & Ops Teams: Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.
Documentation: Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.
Continuous Improvement: Identify opportunities for improving system architecture, deployment strategies, and automation workflows.
Cloud Infrastructure Management: Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.
On-Call Support: Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.

Job Requirement

Experience : 5 to 12 years
Technical skills as mentioned below :

Must Have :

1. Kubernetes Expertise:

Hands-on experience with installing and provisioning Kubernetes clusters.

Deep understanding of core Kubernetes components such as CRI, CNS, ETCD, CoreDNS, KubeProxy.

Strong knowledge of Kubernetes internal networking, service discovery, and ingress management.

2. Kubernetes Distributions:

Hands-on experience with different Kubernetes provisioners and distributions.

3. Kubernetes Cluster Administration:

Experience in administering production Kubernetes clusters, including backup and disaster recovery (DR)

strategies.

Familiarity with cluster health monitoring and troubleshooting issues.

4. Monitoring tools : Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics

5. Automation & Scripting:

Strong programming skills in Python or Shell, or similar languages.

Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform or Ansible.

Cloud automation experience, ideally with AWS or other major cloud platforms.

6. Operating Systems: Hands-on experience with Linux system administration.

Microservices : Experience with microservices architecture and managing more than 50 microservices

simultaneously.

Good to Have Skills:

Experience with OpenShift virtualization in production environments.
Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.
CKA (Certified Kubernetes Administrator) certification or equivalent.
Experience in fine-tuning RHEL, CentOS, and Ubuntu.
Familiarity with DevSecOps practices, container security, and compliance frameworks.

Five Reasons Why You Should Join Zycus

Industry Recognized Leader: Zycus is recognized by Gartner (world's leading market research analyst) as a Leader in Procurement Software Suites.Zycus is also recognized as a Customer First Organization by Gartner. Zycus's Procure to Pay Suite Scores 4.5 out of 5 ratings in Gartner Peer Insights for Procure-to-Pay Suites.
Pioneer in Cognitive Procurement: Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises
Fast Growing: Growing Region at the rate of 30% Y-o-Y
Global Enterprise Customers: Work with Large Enterprise Customers globally to drive Complex Global Implementation on the value framework of Zycus
AI Product Suite: Steer next gen cognitive product suite offering

About Us

Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises for two decades. Zycus has been consistently recognized by Gartner, Forrester, and other analysts for its Source to Pay integrated suite. Zycus powers its S2P software with the revolutionary Merlin AI Suite. Merlin AI takes over the tactical tasks and empowers procurement and AP officers to focus on strategic projects; offers data-driven actionable insights for quicker and smarter decisions, and its conversational AI offers a B2C type user-experience to the end-users.

Zycus helps enterprises drive real savings, reduce risks, and boost compliance, and its seamless, intuitive, and easy-to-use user interface ensures high adoption and value across the organization.

Start your #CognitiveProcurement journey with us, as you are #MeantforMore

Site Reliability Engineer

2 days ago

Mumbai, Maharashtra, India Ocean Flex International Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Roll: Site Reliability Engineer Exp. : 3+ Position Type: Contract Location : Mumbai, Maharashtra, India Mandatory Skills: IT Operations Management JOB DESCRIPTION Role Purpose Required Skills: · experience in system administration, application development, infrastructure development or related areas · experience with programming in languages like...
Site Reliability Engineering Manager

2 days ago

Mumbai, Maharashtra, India equentis Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Job Title: Site Reliability Engineer (SRE)Company – Equentis Wealth Advisory LimitedLocation – Lower Parel, MumbaiJob Summary: We are seeking a talented Site Reliability Engineer (SRE) to join our team andplay a critical role in ensuring the reliability, scalability, and performance of our systems andapplications. The ideal candidate will have a strong...
Site Reliability Engineer

2 days ago

Mumbai, Maharashtra, India Aanseacore Full time ₹ 12,00,000 - ₹ 24,00,000 per year

We are seeking experienced Site Reliability Engineers (SREs) and CDN Specialists with deep expertise in global performance optimization, cloud infrastructure reliability, and edge computing. The ideal candidate will have a strong technical foundation in network performance engineering, Azure cloud operations, and CDN/edge delivery systems, ensuring...
Site Reliability Engineer III

2 days ago

Mumbai, Maharashtra, India JPMorganChase Full time ₹ 12,00,000 - ₹ 24,00,000 per year

DescriptionThere's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Corperate Technology, you will solve complex and broad business problems...
Site Reliability Engineer III

2 days ago

Mumbai, Maharashtra, India JPMorgan Chase Full time ₹ 12,00,000 - ₹ 36,00,000 per year

There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Corperate Technology, you will solve complex and broad business problems with...
Site Reliability Engineer

2 weeks ago

Mumbai, Maharashtra, India Hirexa Solutions Full time ₹ 4,00,000 - ₹ 12,00,000 per year

HI All, We are hiring for Site Reliability Engineer with one of our product-based client - Permanent hiring Skills: Should Have At least 7+ years of Experience on AWSShould have Good Hands-On Experience on Below skillsObservability/Monitoring*Python*Bash/Shell ScriptTerraform*Automation*Account PipelineService NowGitlabJira Exp: 7 to 14 Yrs CTC: Exp*2.5...
Site Reliability Engineering Lead

7 days ago

Mumbai, Maharashtra, India RELX Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Would you like to be part of a team that delivers high-quality software to our customers?Are you a visible champion with a 'can do' attitude and enthusiasm that inspires others?About The BusinessLexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on...
Site Reliability Engineer L1

2 days ago

Mumbai, Maharashtra, India Wipro Full time ₹ 6,00,000 - ₹ 18,00,000 per year

Job DescriptionJob Title: Site Reliability Engineer L1City: MumbaiState/Province: MaharashtraPosting Start Date: 11/26/25Job Description:Job DescriptionRole PurposeRequired Skills: 5+Years of experience in system administration, application development, infrastructure development or related areas5+ years of experience with programming in languages like...
Site Reliability Engineering Lead

7 days ago

Mumbai, Maharashtra, India RELX Group Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Would you like to be part of a team that delivers high-quality software to our customers?Are you a visible champion with a 'can do' attitude and enthusiasm that inspires others?About the BusinessLexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on...
Site Engineer

6 days ago

Mumbai, Maharashtra, India Par-pact Environmental Engineering Full time ₹ 12,00,000 - ₹ 18,00,000 per year

Responsibilities:* Ensure project engineering excellence.* Look after installation of site.* Oversee site operations during execution phase.* Manage subcontractors and vendors.* Collaborate with cross-functional teams on projects.Health insuranceProvident fund

Americas

Europe

Asia / Oceania

Africa

Manager- Site Reliability Engineering