Site Reliability Engineer
24 hours ago
Please note that we will never request payment or bank account information at any stage of the recruitment process. As we continue to grow our teams, we urge you to be cautious of fraudulent job postings or recruitment activities that misuse our company name and information. Please protect your personal information during any recruitment process. While Monks may contact potential candidates via LinkedIn, all applications must be submitted through our official website ).
Location: Remote / India
Experience Level: 5+ years
Type: Contractor (2 months)
About the Role
We are looking for a Site Reliability Engineer (SRE) with a strong background in observability, automation, and platform resilience to drive the operability and reliability of our Disaster Recovery as a Service (DRaaS) solution.
This role is essential in ensuring our DR environments are resilient, observable, and continuously improving. You'll collaborate with DR architects, security, infrastructure, and engineering teams to define SLIs/SLOs for critical systems, reduce operational toil, and lead efforts such as chaos engineering and game-day simulations.
Key Responsibilities
- Build and maintain observability dashboards and proactive alerting systems to monitor DR environments across Azure, AWS, and private cloud (e.g., HPE GreenLake).
- Define and track Service Level Indicators (SLIs) and Error Budgets aligned with strict RPO/RTO targets.
- Collaborate on runbook automation, synthetic testing, and validation pipelines for DR readiness.
- Lead chaos engineering initiatives and game-day exercises to proactively identify weak points and ensure high system resilience.
- Conduct post-incident reviews, implement feedback loops, and own the resulting automation backlog.
- Work with DR architecture and engineering teams to drive infrastructure as code (IaC) practices and platform reliability improvements.
- Participate in quarterly failover/failback simulations, monitor performance, and propose observability enhancements.
- Help define SLOs for protected application groups (VPGs) and contribute to reporting for DR testing and compliance audits.
- Advocate for and implement best practices around toil reduction, incident response, and on-call efficiency.
Requirements
- 5+ years of experience in SRE, DevOps, or Platform Engineering roles.
- Strong hands-on experience with observability tools like Grafana, Prometheus, Datadog, or Splunk.
- Experience designing and maintaining SLIs/SLOs, error budgets, and availability dashboards.
- Proficiency in at least one scripting or programming language (e.g., Python, Bash, Go).
- Knowledge of disaster recovery principles, RPO/RTO targets, and infrastructure failover practices.
- Experience with incident response, blameless postmortems, and tracking improvement actions.
- Familiarity with IaC tools such as Terraform, Ansible, or CloudFormation.
- Experience with CI/CD, automated testing, and cloud-native deployments in Azure or AWS.
- Strong problem-solving and collaboration skills, with the ability to work across cross-functional teams.
- Fluent in English (written and spoken).
Nice to Have (strong plus)
- Experience with Zerto, Veeam, or similar DR orchestration platforms.
- Background in chaos engineering using tools like Gremlin or LitmusChaos.
- Exposure to TISAX, ISO 27001, or other compliance-aligned monitoring.
- Knowledge of Kubernetes and container orchestration for DR environments.
- Previous experience in platform reliability for mission-critical systems.
#LI-PP1
#LI-Remote
About Monks
Monks is the global, purely digital, unitary operating brand of S4Capital plc. With a legacy of innovation and specialized expertise, Monks combines an extraordinary range of global marketing and technology services to accelerate business possibilities and redefine how brands and businesses interact with the world. Its integration of systems and workflows delivers unfettered content production, scaled experiences, enterprise-grade technology and data science fueled by AI—managed by the industry's best and most diverse digital talent—to help the world's trailblazing companies outmaneuver and outpace their competition.
Monks was named a Contender in The Forrester Wave: Global Marketing Services. It has remained a constant presence on Adweek's Fastest Growing lists , ranks among Cannes Lions' Top 10 Creative Companies and is the only partner to have been placed in AdExchanger's Programmatic Power Players list every year In addition to being named Adweek's first AI Agency of the Year (2023), Monks has been recognized by Business Intelligence in its 2024 Excellence in Artificial Intelligence Awards program in three categories: the Individual category, Organizational Winner in AI Strategic Planning and AI Product for its service Monks.Flow. Monks has also garnered the title of Webby Production Company of the Year , won a record number of FWAs and has earned a spot on Newsweek's Top 100 Global Most Loved Workplaces 2023.
We are an equal-opportunity employer committed to building a respectful and empowering work environment for all people to freely express themselves amongst colleagues who embrace diversity in all respects. Including fresh voices and unique points of view in all aspects of our business not only creates an environment where we can all grow and thrive but also increases our potential to produce work that better represents—and resonates with—the world around us.
-
Site Reliability Engineer
7 days ago
Bengaluru, India Relanto Full timeJob Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...
-
Site Reliability Engineer
3 weeks ago
, India, IN Sonata Software Full timeWe're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...
-
Site Reliability Engineer
2 weeks ago
India Akamai Full time ₹ 5,00,000 - ₹ 15,00,000 per yearDo you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India, Karnataka ViewSonic Full timeJob Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...
-
Site Reliability Engineer
4 days ago
India CareerUS Solutions Full timeJob Description Position Overview: The Site Reliability Engineer (SRE) is responsible for ensuring the stability, scalability, performance, and reliability of production systems and services. This role bridges software development and operations, using automation, monitoring, and performance optimization to build resilient systems that can scale efficiently...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India, Karnataka WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India, Karnataka HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 Years Job PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance. Job Responsibilities: Help build a Site Reliability...
-
Site Reliability Engineer
4 weeks ago
Noida, Uttar Pradesh, India, Ghaziabad CorroHealth Full timeWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and systems administration, with a focus on creating scalable and reliable systems. You will work closely with development and operations teams to ensure the reliability, availability, and...
-
Site Reliability Engineer
2 weeks ago
India Zensar Technologies Full time ₹ 20,00,000 - ₹ 25,00,000 per yearCandidate having skilled and proactive Site Reliability Engineer (SRE) with 10 Years experienceThe SRE will be responsible for ensuring the reliability, scalability, and performance of our systems and infrastructure.This role blends software engineering with IT operations to build fault-tolerant, self-healing systems and drive continuous improvement across...
-
Site Reliability Engineer
3 weeks ago
Bengaluru, Karnataka, India, Karnataka IntraEdge Full timeJob Title: Site Reliability Engineer (SRE) – Production SupportLocation: BengaluruJob Summary:We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management. The ideal candidate will be responsible for maintaining the reliability, performance, and scalability...