Manager- Site Reliability Engineering

6 days ago

Mumbai, Maharashtra, India Zycus Full time ₹ 12,00,000 - ₹ 36,00,000 per year

About Us
Zycus, recognized by leading analyst firms in procurement technology, empowers teams to unlock deep value through its comprehensive Source-to-Pay (S2P) solutions. At the heart of our S2P solution is the Merlin Agentic Platform, which orchestrates intelligent AI agents to deliver simplified, efficient, and compliant processes.

The Merlin Intake Agent Offers Business Users Unparalleled Ease Of Use, Increasing Adoption Rates And Significantly Reducing Non-compliant Spending. For Procurement Teams, The Merlin Autonomous Negotiation Agent Handles Tail Spend Autonomously, Securing Additional Savings; The Merlin Contract Agent Helps Draft Compliant Contracts And Reduces Risks By Actively Monitoring Them; And The Merlin AP Agent Further Enhances Efficiency By Automating Invoice Processing With Exceptional Speed And Accuracy.
We Are An Equal Opportunity Employer:
Zycus is committed to providing equal opportunities in employment and creating an inclusive work environment. We do not discriminate against applicants on the basis of race, color, religion, gender, sexual orientation, national origin, age, disability, or any other legally protected characteristic. All hiring decisions will be based solely on qualifications, skills, and experience relevant to the job requirements.

Job Description
Zycus is looking for a
Site Reliability Engineer (SRE)
with deep expertise in
Kubernetes
,
automation
, and
Linux systems
. The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on
microservices architecture
, ensuring automation, performance, and reliability across our SaaS platform.

*Roles And Responsibilities:*

System Reliability & Uptime: Ensure high availability, performance, and reliability of applications and infrastructure.
Kubernetes & Cluster Management: Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
Microservices Management: Handle the deployment, monitoring, and scaling of microservices in distributed environments.
Incident Management: Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
Automation & Infrastructure as Code (IaC): Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
Monitoring & Observability: Implement and maintain monitoring tools (e.g., Prometheus, Grafana, Datadog) to track system health and application performance.
Performance Optimization: Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.
Disaster Recovery & Backup: Design and implement backup and disaster recovery (DR) strategies for business continuity.
Capacity Planning: Forecast infrastructure needs based on performance trends and business growth to ensure scalability.
Security & Compliance: Ensure infrastructure and applications meet security standards and compliance requirements.
Collaboration with Dev & Ops Teams: Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.
Documentation: Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.
Continuous Improvement: Identify opportunities for improving system architecture, deployment strategies, and automation workflows.
Cloud Infrastructure Management: Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.
On-Call Support: Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.

Job Requirement

Experience : 5 to 12 years
Technical skills as mentioned below :

*Must Have :*

Kubernetes Expertise:

Hands-on experience with
installing and provisioning Kubernetes clusters
.

Deep understanding of
core Kubernetes components
such as
CRI, CNS, ETCD, CoreDNS, KubeProxy
.

Strong knowledge of
Kubernetes internal networking
, service discovery, and ingress management.

Kubernetes Distributions:

Hands-on experience with different
Kubernetes provisioners
and distributions.

Kubernetes Cluster Administration:

Experience in
administering production Kubernetes clusters
, including
backup and disaster recovery (DR)
strategies.

Familiarity with
cluster health monitoring
and troubleshooting issues.

Monitoring tools : Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics
Automation & Scripting:

Strong programming skills in
Python or Shell
, or similar languages.

Hands-on experience with
Infrastructure-as-Code (IaC)
tools such as
Terraform
or
Ansible
.

Cloud automation experience, ideally with
AWS
or other major cloud platforms.

Operating Systems: Hands-on experience with Linux system administration.
Microservices : Experience with microservices architecture and managing more than 50 microservices

simultaneously.

*Good To Have Skills:*

Experience with OpenShift virtualization in production environments.
Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.
CKA (Certified Kubernetes Administrator) certification or equivalent.
Experience in fine-tuning RHEL, CentOS, and Ubuntu.
Familiarity with DevSecOps practices, container security, and compliance frameworks.

Five Reasons Why You Should Join Zycus

Industry Recognized Leader: Zycus is recognized by Gartner (world's leading market research analyst) as a Leader in Procurement Software Suites. Zycus is also recognized as a Customer First Organization by Gartner. Zycus's Procure to Pay Suite Scores 4.5 out of 5 ratings in Gartner Peer Insights for Procure-to-Pay Suites.
Pioneer in Cognitive Procurement: Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises
Fast Growing: Growing Region at the rate of 30% Y-o-Y
Global Enterprise Customers: Work with Large Enterprise Customers globally to drive Complex Global Implementation on the value framework of Zycus
AI Product Suite: Steer next gen cognitive product suite offering

*About Us*
Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises for two decades. Zycus has been consistently recognized by Gartner, Forrester, and other analysts for its Source to Pay integrated suite. Zycus powers its S2P software with the revolutionary Merlin AI Suite. Merlin AI takes over the tactical tasks and empowers procurement and AP officers to focus on strategic projects; offers data-driven actionable insights for quicker and smarter decisions, and its conversational AI offers a B2C type user-experience to the end-users.

Zycus helps enterprises drive real savings, reduce risks, and boost compliance, and its seamless, intuitive, and easy-to-use user interface ensures high adoption and value across the organization.

Start your #CognitiveProcurement journey with us, as you are #MeantforMore

Site Reliability Engineer

1 week ago

Mumbai, Maharashtra, India Oracle Financial Services Software Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Senior Site Reliability Developer OCI is Oracle's next-generation cloud platform, built for the most demanding enterprise workloads. We deliver high-performance computing, storage, networking, and platform services at global scale. The AI Platform, Services & Solutions organization within OCI is building the foundation for enterprise AI—spanning GPU...
Site Reliability Engineer

2 weeks ago

Mumbai, Maharashtra, India Oracle Financial Services Software Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Site Reliability Developer 3 Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale...
Site Reliability Engineer

1 week ago

Mumbai, Maharashtra, India Talent Leads HR Solutions Pvt Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Skill, Knowledge &Trainings : - Site Reliability Engineer will be responsible to develop and implement services that improve Software development Life Cycle. - Build automations which will help optimize software delivery. - Improve reliability, quality, and time-to-market of our suite of software solutions. - Will be responsible for availability,...
Site Reliability Engineer

3 days ago

Mumbai, Maharashtra, India Avant-Garde Corporate Services Private Limited Full time ₹ 15,00,000 - ₹ 25,00,000 per year

We are seeking a skilled and proactive Site Reliability Engineer (SRE) to join the IT Transformation team.The role involves driving automation, reliability, and performance optimization across mission-critical applications and infrastructure within a financial market ecosystem.The successful candidate will manage end-to-end deployment automation, CI/CD...
Site Engineer

4 days ago

Mumbai, Maharashtra, India Alfa Engineering Full time ₹ 18,000 - ₹ 2,16,000 per year

machinal and civil related work as site engineerJob Type: Full-timePay: ₹15, ₹18,000.00 per monthBenefits:Cell phone reimbursementAbility to commute/relocate:Mumbai, Maharashtra: Reliably commute or planning to relocate before starting work (Required)Education:Diploma (Preferred)Experience:total work: 1 year (Required)Shift availability:Day Shift...
Site Reliability Engineering Lead

2 weeks ago

Mumbai, Maharashtra, India RELX Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Would you like to be part of a team that delivers high-quality software to our customers?Are you a visible champion with a 'can do' attitude and enthusiasm that inspires others?About The BusinessLexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on...
Site Reliability Engineer

19 hours ago

Mumbai, Maharashtra, India Search Synergy Pvt Ltd Full time ₹ 6,00,000 - ₹ 18,00,000 per year

Note - Location - Dadar/Kurla (Mumbai)Skill, Knowledge &Trainings : - Own and manage the CI/CD pipelines for automated build, test, and deployment. - Design and implement robust deployment strategies for microservices and web applications. - Set up and maintain monitoring, alerting, and logging frameworks (e.g., Prometheus, Grafana, ELK) - Build...
Site Reliability Engineer 2

1 week ago

Navi Mumbai, Maharashtra, India Uplers Full time ₹ 8,00,000 - ₹ 25,00,000 per year

Experience: 4+ yearsSalary: ConfidentialShift: (GMT+05:30) Asia/Kolkata (IST)Opportunity Type: Office (Mumbai)Placement Type: Full time Permanent Position(*Note: This is a requirement for one of Uplers' client--Gofynd)What do you need for this opportunity?Must have skills required: and AWS/Google Cloud and MongoDB/CI/CD/GrafanaJob descriptionFynd is Indias...
Site Reliability Engineering Lead

2 weeks ago

Mumbai, Maharashtra, India RELX Group Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Would you like to be part of a team that delivers high-quality software to our customers?Are you a visible champion with a 'can do' attitude and enthusiasm that inspires others?About the BusinessLexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on...
Site Reliability Engineer II

3 days ago

Mumbai, Maharashtra, India JPMorganChase Full time ₹ 15,00,000 - ₹ 25,00,000 per year

JOB DESCRIPTIONPlay a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions.As a Site Reliability Engineer II at JPMorgan Chase within the Client Onboarding team which is aligned to Corporate Technology division, you will use technology to solve business problems and leverage software engineering best...

Americas

Europe

Asia / Oceania

Africa

Manager- Site Reliability Engineering