
Manager- Site Reliability Engineering
7 days ago
About Us
Zycus, recognized by leading analyst firms in procurement technology, empowers teams to unlock deep value through its comprehensive Source-to-Pay (S2P) solutions. At the heart of our S2P solution is the Merlin Agentic Platform, which orchestrates intelligent AI agents to deliver simplified, efficient, and compliant processes.
The Merlin Intake Agent offers business users unparalleled ease of use, increasing adoption rates and significantly reducing non-compliant spending. For procurement teams, the Merlin Autonomous Negotiation Agent handles tail spend autonomously, securing additional savings; the Merlin Contract Agent helps draft compliant contracts and reduces risks by actively monitoring them; and the Merlin AP Agent further enhances efficiency by automating invoice processing with exceptional speed and accuracy.We Are An Equal Opportunity Employer:
Zycus is committed to providing equal opportunities in employment and creating an inclusive work environment. We do not discriminate against applicants on the basis of race, color, religion, gender, sexual orientation, national origin, age, disability, or any other legally protected characteristic. All hiring decisions will be based solely on qualifications, skills, and experience relevant to the job requirements.
Job Description
Zycus is looking for a Site Reliability Engineer (SRE) with deep expertise in Kubernetes, automation, and Linux systems. The ideal candidate will have hands-on experience in deploying, administrating, and optimizing large-scale production systems, with a strong focus on microservices architecture, ensuring automation, performance, and reliability across our SaaS platform.
Roles and Responsibilities:
- System Reliability & Uptime: Ensure high availability, performance, and reliability of applications and infrastructure.
- Kubernetes & Cluster Management: Deploy, administer, and maintain Kubernetes clusters, managing scaling, upgrades, and troubleshooting.
- Microservices Management: Handle the deployment, monitoring, and scaling of microservices in distributed environments.
- Incident Management: Respond to production incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence.
- Automation & Infrastructure as Code (IaC): Automate repetitive tasks, infrastructure provisioning, and deployment workflows using tools like Ansible and Terraform.
- Monitoring & Observability: Implement and maintain monitoring tools (e.g., Prometheus, Grafana, Datadog) to track system health and application performance.
- Performance Optimization: Analyze system performance, identify bottlenecks, and optimize resources for better efficiency.
- Disaster Recovery & Backup: Design and implement backup and disaster recovery (DR) strategies for business continuity.
- Capacity Planning: Forecast infrastructure needs based on performance trends and business growth to ensure scalability.
- Security & Compliance: Ensure infrastructure and applications meet security standards and compliance requirements.
- Collaboration with Dev & Ops Teams: Work closely with development and operations teams to improve deployment pipelines, release processes, and system reliability.
- Documentation: Maintain clear and detailed documentation of systems, processes, and incident reports for knowledge sharing and compliance.
- Continuous Improvement: Identify opportunities for improving system architecture, deployment strategies, and automation workflows.
- Cloud Infrastructure Management: Manage cloud services (AWS, GCP, Azure) for resource optimization, cost management, and automation.
- On-Call Support: Participate in on-call rotations to handle urgent production issues and ensure rapid recovery.
Job Requirement
- Experience : 5 to 12 years
- Technical skills as mentioned below :
Must Have :
1. Kubernetes Expertise:
Hands-on experience with installing and provisioning Kubernetes clusters.
Deep understanding of core Kubernetes components such as CRI, CNS, ETCD, CoreDNS, KubeProxy.
Strong knowledge of Kubernetes internal networking, service discovery, and ingress management.
2. Kubernetes Distributions:
Hands-on experience with different Kubernetes provisioners and distributions.
3. Kubernetes Cluster Administration:
Experience in administering production Kubernetes clusters, including backup and disaster recovery (DR)
strategies.
Familiarity with cluster health monitoring and troubleshooting issues.
4. Monitoring tools : Exposure to monitoring tools such as Prometheus, Grafana, Datadog or AppDynamics
5. Automation & Scripting:
Strong programming skills in Python or Shell, or similar languages.
Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform or Ansible.
Cloud automation experience, ideally with AWS or other major cloud platforms.
6. Operating Systems: Hands-on experience with Linux system administration.
- Microservices : Experience with microservices architecture and managing more than 50 microservices
simultaneously.
Good to Have Skills:
- Experience with OpenShift virtualization in production environments.
- Knowledge of AWS EKS, Rancher, or other Kubernetes distributions.
- CKA (Certified Kubernetes Administrator) certification or equivalent.
- Experience in fine-tuning RHEL, CentOS, and Ubuntu.
- Familiarity with DevSecOps practices, container security, and compliance frameworks.
Five Reasons Why You Should Join Zycus
Industry Recognized Leader: Zycus is recognized by Gartner (world's leading market research analyst) as a Leader in Procurement Software Suites.Zycus is also recognized as a Customer First Organization by Gartner. Zycus's Procure to Pay Suite Scores 4.5 out of 5 ratings in Gartner Peer Insights for Procure-to-Pay Suites.
Pioneer in Cognitive Procurement: Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises
Fast Growing: Growing Region at the rate of 30% Y-o-Y
Global Enterprise Customers: Work with Large Enterprise Customers globally to drive Complex Global Implementation on the value framework of Zycus
AI Product Suite: Steer next gen cognitive product suite offering
About Us
Zycus is a pioneer in Cognitive Procurement software and has been a trusted partner of choice for large global enterprises for two decades. Zycus has been consistently recognized by Gartner, Forrester, and other analysts for its Source to Pay integrated suite. Zycus powers its S2P software with the revolutionary Merlin AI Suite. Merlin AI takes over the tactical tasks and empowers procurement and AP officers to focus on strategic projects; offers data-driven actionable insights for quicker and smarter decisions, and its conversational AI offers a B2C type user-experience to the end-users.
Zycus helps enterprises drive real savings, reduce risks, and boost compliance, and its seamless, intuitive, and easy-to-use user interface ensures high adoption and value across the organization.
Start your #CognitiveProcurement journey with us, as you are #MeantforMore
-
Site Reliability Engineer
1 week ago
Mumbai, Maharashtra, India Oracle Financial Services Software Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSenior Site Reliability Developer OCI is Oracle's next-generation cloud platform, built for the most demanding enterprise workloads. We deliver high-performance computing, storage, networking, and platform services at global scale. The AI Platform, Services & Solutions organization within OCI is building the foundation for enterprise AI—spanning GPU...
-
Site Reliability Engineer
2 weeks ago
Mumbai, Maharashtra, India Oracle Financial Services Software Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per yearSite Reliability Developer 3 Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale...
-
Site Reliability Engineer
1 week ago
Mumbai, Maharashtra, India Talent Leads HR Solutions Pvt Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per yearSkill, Knowledge &Trainings : - Site Reliability Engineer will be responsible to develop and implement services that improve Software development Life Cycle. - Build automations which will help optimize software delivery. - Improve reliability, quality, and time-to-market of our suite of software solutions. - Will be responsible for availability,...
-
Site Reliability Engineer
4 days ago
Mumbai, Maharashtra, India Avant-Garde Corporate Services Private Limited Full time ₹ 15,00,000 - ₹ 25,00,000 per yearWe are seeking a skilled and proactive Site Reliability Engineer (SRE) to join the IT Transformation team.The role involves driving automation, reliability, and performance optimization across mission-critical applications and infrastructure within a financial market ecosystem.The successful candidate will manage end-to-end deployment automation, CI/CD...
-
Site Engineer
5 days ago
Mumbai, Maharashtra, India Alfa Engineering Full time ₹ 18,000 - ₹ 2,16,000 per yearmachinal and civil related work as site engineerJob Type: Full-timePay: ₹15, ₹18,000.00 per monthBenefits:Cell phone reimbursementAbility to commute/relocate:Mumbai, Maharashtra: Reliably commute or planning to relocate before starting work (Required)Education:Diploma (Preferred)Experience:total work: 1 year (Required)Shift availability:Day Shift...
-
Site Reliability Engineer
3 hours ago
Mumbai, Maharashtra, India Fynd Full time ₹ 8,00,000 - ₹ 24,00,000 per yearFynd is India's largest omnichannel platform and a multi-platform tech company specializing in retail technology and products in AI, ML, big data, image editing, and the learning space. It provides a unified platform for businesses to seamlessly manage online and offline sales, store operations, inventory, and customer engagement. Serving over 2,300 brands,...
-
Site Reliability Engineering Lead
2 weeks ago
Mumbai, Maharashtra, India RELX Full time ₹ 20,00,000 - ₹ 25,00,000 per yearWould you like to be part of a team that delivers high-quality software to our customers?Are you a visible champion with a 'can do' attitude and enthusiasm that inspires others?About The BusinessLexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on...
-
Site Reliability Engineer
5 hours ago
Mumbai, Maharashtra, India ALIQAN Technologies Full time ₹ 20,00,000 - ₹ 25,00,000 per yearPosition: SITE Reliability EngineerBudget- 2. 4 LPM + GSTExp- 10 yrsDuration- 6 monthsLocation- Andheri MumbaiTechnical Skills:Programming: Proficiency in languages like Python.Operating Systems: Deep understanding of Linux/Windows operating systems and networking concepts.Cloud Technologies: Experience with Azure including services, architecture, and best...
-
Site Reliability Engineer
2 days ago
Mumbai, Maharashtra, India Search Synergy Pvt Ltd Full time ₹ 6,00,000 - ₹ 18,00,000 per yearNote - Location - Dadar/Kurla (Mumbai)Skill, Knowledge &Trainings : - Own and manage the CI/CD pipelines for automated build, test, and deployment. - Design and implement robust deployment strategies for microservices and web applications. - Set up and maintain monitoring, alerting, and logging frameworks (e.g., Prometheus, Grafana, ELK) - Build...
-
Site Reliability Engineer 2
2 weeks ago
Navi Mumbai, Maharashtra, India Uplers Full time ₹ 8,00,000 - ₹ 25,00,000 per yearExperience: 4+ yearsSalary: ConfidentialShift: (GMT+05:30) Asia/Kolkata (IST)Opportunity Type: Office (Mumbai)Placement Type: Full time Permanent Position(*Note: This is a requirement for one of Uplers' client--Gofynd)What do you need for this opportunity?Must have skills required: and AWS/Google Cloud and MongoDB/CI/CD/GrafanaJob descriptionFynd is Indias...