Lead Software Engineer, Cloud Site Reliability
4 days ago
Job Description
About CloudOps Team:
CloudOps team is responsible for availability, reliability, performance, monitoring, emergency response, and capacity planning of Icertis SaaS applications and related services. CloudOps executes infra & access provisioning, upgrades, deployments, and change management to drive faster time to market. This team plays a critical role in building and executing the cloud strategy for the company, driving architectural improvements to enhance scalability and optimize overall cost.
Responsibilities
Role Responsibilities
:
- Lead and execute large-scale site reliability engineering initiatives to improve performance, reliability, and scalability in Azure, AWS, and GCP environments.
- Implement and manage Azure AKS, AWS ECS/EKS environments, Kubernetes, and Docker-based container management platforms for mission-critical applications.
- Build automation and operational workflows using cloud-native capabilities for provisioning, scaling, monitoring, and self-healing systems.
- Collaborate with engineering teams to design and deploy cloud-native, containerized applications with robust CI/CD pipelines.
- Drive early detection and prevention of incidents through improved telemetry, monitoring, and automated recovery mechanisms.
- Work closely with cloud providers to optimize offerings and leverage new features for reliability and cost efficiency.
- Mentor and guide junior engineers in best practices for SRE, Kubernetes management, and automation.
Qualifications
Required Skills:
- 8–12 years of experience in Cloud Operations / SRE roles in mission-critical, 24x7 SaaS environments.
- Strong hands-on experience with Azure Kubernetes Service (AKS), AWS ECS/EKS, Kubernetes, Docker, and container lifecycle management.
- Proficiency in creating infrastructure automation using cloud-native tools, Helm charts, ARM templates, Terraform, or similar IaC frameworks.
- Strong understanding of cloud compute, storage, networking, and container orchestration concepts.
- Scripting skills in PowerShell, Python, Bash, or similar languages.
- Experience with CI/CD pipelines, monitoring solutions (Prometheus, Grafana, Azure Monitor), and log management systems.
- Proven expertise in cloud operations, SRE/DevOps practices, automation (IaC/CI-CD), observability, and AIOps leveraging AI/ML for predictive monitoring, incident correlation, anomaly detection, and self-healing in cloud-native environments
- Excellent problem-solving, communication, and collaboration skills.
About Us
Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, more than one third of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.
About The Team
Who we a re: Icertis is the only contract intelligence platform companies trust to keep them out in front, now and in the future. Our unwavering commitment to contract intelligence is grounded in our FORTE values—Fairness, Openness, Respect, Teamwork and Execution—which guide all our interactions with employees, customers, partners, and stakeholders. Because in our mission to be the contract intelligence platform of the world, we believe how we get there is as important as the destination.
Icertis, Inc. provides Equal Employment Opportunity to all employees and applicants for employment without regard to race, color, religion, gender identity or expression, sex, sexual orientation, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. Icertis, Inc. complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities. If you are in need of accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to or get in touch with your recruiter.
-
Site Reliability Engineer
2 days ago
Pune, Maharashtra, India Ather Energy Full time ₹ 6,00,000 - ₹ 18,00,000 per yearYou'll be our: Site Reliability EngineerYou'll be based at: Pune Zonal OfficeYou'll be aligned with: Cloud and Data Platform Lead / Cloud ArchitectYou'll be a member of: Cloud and Data Platform TeamAther's fleet of smart scooters is growing rapidly, and so is the volume of data they generate. Our Vehicle Data Platform (VDP) is the core of this ecosystem, and...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Techverito Software Solutions LLP Full time ₹ 8,00,000 - ₹ 24,00,000 per yearJob Description3-5 years of proven and progressive experience as an SRE or DevOps Engineer. As a SRE Engineer, you will have a strong background in cloud infrastructure management and deployment, with expertise in AWS cloud, DevOps tools, and Kubernetes ecosystem. The primary focus of this role will be to design, implement, and manage our cloud...
-
Site Reliability Engineer
1 week ago
Pune, Maharashtra, India Talent Worx Full time ₹ 15,00,000 - ₹ 25,00,000 per yearSite Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Senior Software Engineer, Reliability
2 weeks ago
Pune, Maharashtra, India Veeam Software Full time ₹ 8,00,000 - ₹ 24,00,000 per yearVeeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their...
-
SRE (Site Reliability Engineer)
24 hours ago
Pune, Maharashtra, India Apex One Full time ₹ 6,00,000 - ₹ 18,00,000 per yearJob Overview We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable software solutions in order to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any...
-
Cloud Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India NiCE Full time US$ 1,00,000 - US$ 1,50,000 per yearAt NiCE, we don't limit our challenges. We challenge our limits. Always. We're ambitious. We're game changers. And we play to win. We set the highest standards and execute beyond them. And if you're like us, we can offer you the ultimate career opportunity that will light a fire within you.So, what's the role all about?NICE Public Safety has expanded...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India Growel Softech Pvt. Ltd. Full time ₹ 12,96,000 - ₹ 1,51,20,000 per yearJob TitleSite Reliability EngineerLocationPune (Hybrid - 3days in a week at office, 2 days wfh, Candidate needs toreport to only Pune office) (Relocation is considerable)Shift Timings12:30 PM - 9:30 PM ISTBudget - 10+ to 12+ yrs 31 LPA13 to 15+ yrs 36 LPAInterview2 rounds (HMs availability is between 3PM 5PM IST)Positions4Considerable Notice Period - 30...
-
Site Reliability Engineer
2 days ago
Pune, Maharashtra, India Equifax Full time ₹ 10,00,000 - ₹ 25,00,000 per yearSite Reliability Engineering (SRE)at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles.SRE is also an...
-
Site Reliability Engineer
4 days ago
Pune, Maharashtra, India Rockwell Automation Full timeRockwell Automation is a global technology leader focused on helping the world's manufacturers be more productive, sustainable, and agile. With more than 28,000 employees who make the world better every day, we know we have something special. Behind our customers - amazing companies that help feed the world, provide life-saving medicine on a global scale,...
-
Site Reliability Engineer
6 days ago
Pune, Maharashtra, India Rockwell Automation Full timeRockwell Automation is a global technology leader focused on helping the world's manufacturers be more productive, sustainable, and agile. With more than 28,000 employees who make the world better every day, we know we have something special. Behind our customers - amazing companies that help feed the world, provide life-saving medicine on a global scale,...