Site Reliability Engineer

3 days ago


India Xebia Full time

We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency of critical systems.

Key Responsibilities

Cloud Engineering (AWS):

  • Architect, implement, and manage secure, scalable, and cost-efficient AWS infrastructure (EC2, Lambda, EKS, S3, RDS, IAM, CloudFront, etc.).
  • Automate infrastructure provisioning and configuration using Terraform / CloudFormation and AWS SDKs.
  • Manage containerized workloads (Docker, Kubernetes, EKS).

Python Development:

  • Build automation scripts, deployment utilities, and infrastructure tooling using Python (Boto3, Flask, FastAPI, etc.) .
  • Develop custom monitoring/alerting integrations with APIs, SDKs, and third-party observability platforms.
  • Implement self-healing and resilience-focused automation scripts.

Chaos Engineering & Resiliency:

  • Design and execute chaos experiments (fault injection, latency, outages, resource failures) to validate system resilience.
  • Use tools like Gremlin, Litmus, Chaos Mesh, or AWS Fault Injection Simulator .
  • Partner with SRE and development teams to define SLIs, SLOs, and error budgets .
  • Document learnings from chaos tests and improve incident response & recovery playbooks.

DevOps & Observability:

  • Build and maintain CI/CD pipelines for automated deployments (Jenkins, GitHub Actions, GitLab CI, AWS CodePipeline).
  • Integrate observability frameworks (Prometheus, Grafana, ELK/EFK, CloudWatch, Datadog) for monitoring and tracing.
  • Ensure proactive alerting and real-time visibility into system health.

Security & Compliance:

  • Apply AWS security best practices for IAM, networking, and data protection.
  • Ensure compliance with internal and external regulatory frameworks (SOC2, ISO, GDPR, etc.).

Required Skills & Qualifications

  • 6–10 years of experience in Cloud, DevOps, or SRE roles.
  • Strong hands-on expertise in AWS Cloud (certifications preferred: AWS DevOps Engineer / Solutions Architect).
  • Advanced Python development skills for automation and tooling (Boto3 a must).
  • Experience designing and running chaos experiments (Gremlin, AWS FIS, Litmus, Chaos Mesh, or custom Python-based fault injection).
  • Solid knowledge of IaC (Terraform / CloudFormation) .
  • Proficiency in containers & orchestration (Docker, Kubernetes, EKS) .
  • Strong background in monitoring, observability, and incident management .
  • Familiarity with DevOps toolchain (CI/CD, Git, Jenkins, GitLab, CodePipeline) .
  • Good understanding of resilient architectures, reliability principles, and disaster recovery .

Preferred Skills

  • Knowledge of Go / Shell scripting in addition to Python.
  • Experience with chaos testing in production-like environments .
  • Exposure to multi-cloud or hybrid-cloud environments .
  • Strong problem-solving and debugging skills.

What We Offer

  • Opportunity to lead cloud reliability & chaos engineering initiatives .
  • Culture focused on automation, resilience, and continuous improvement .
  • Growth opportunities through certifications, R&D projects, and leadership roles.


  • India Concord Full time

    SRE Sr. Engineers (Individual Contributors) Key Attributes: - Strong SRE (Site Reliability Engineering) experience - DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. - Excellent troubleshooting and debugging skills (infrastructure + application level) - Perseverance – must push through complex/challenging issues without...


  • India Employ Full time

    Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering rolesLocation – Bangalore/ RemoteType - ContractWork Ex - 4-6 yrsWe're working with a AI product company that's building the next generation of GenAI powered developer platforms.We're looking for an experienced Site Reliability Engineer to join their Platform Engineering...


  • India Employ Full time

    Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Bangalore/ Remote Type - Contract Work Ex - 4-6 yrs We're working with a AI product company that's building the next generation of GenAI powered developer platforms . We're looking for an experienced Site Reliability Engineer to join their Platform...


  • india Synechron Full time

    We have immediate opportunity forSRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron –BangaloreJob Role: -SRE (Senior Site Reliability Engineer) Job Location: -Bangalore Notice Period:Within 30daysAbout Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+...


  • India Employ Full time

    Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Fully Remote Type - 6 months Contract Work Ex - 5+ Yrs We’re working with a AI product company that’s building the next generation of GenAI powered developer platforms . We’re looking for an experienced Site Reliability...


  • India Concord Full time

    SRE Sr. Engineers (Individual Contributors) Key Attributes : Strong SRE (Site Reliability Engineering) experience DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. Excellent troubleshooting and debugging skills (infrastructure + application level) Perseverance – must push through complex/challenging issues without giving up...


  • India ValueLabs Full time ₹ 5,00,000 - ₹ 10,00,000 per year

    Experienced in SRE or Site Reliability Engineer Design, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure DevOps. Collaborate with cross-functional teams to optimize system performance, reliability, and scalability. Develop and maintain tools for continuous integration, continuous deployment (CI/CD),...


  • India Akamai Full time US$ 90,000 - US$ 1,20,000 per year

    Do you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...


  • India Akamai Full time

    Do you want to grow your career in Linux and Site Reliability Engineering? Would you like to contribute to the foundation of a new public cloud platform? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...


  • India Concord Full time

    SRE Sr. Engineers (Individual Contributors) Key Attributes : Strong SRE (Site Reliability Engineering) experience DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. Excellent troubleshooting and debugging skills (infrastructure + application level) Perseverance – must push through complex/challenging issues...