Site Reliability Engineer

3 days ago

India Xebia Full time

We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency of critical systems.

Key Responsibilities

Cloud Engineering (AWS):

Architect, implement, and manage secure, scalable, and cost-efficient AWS infrastructure (EC2, Lambda, EKS, S3, RDS, IAM, CloudFront, etc.).
Automate infrastructure provisioning and configuration using Terraform / CloudFormation and AWS SDKs.
Manage containerized workloads (Docker, Kubernetes, EKS).

Python Development:

Build automation scripts, deployment utilities, and infrastructure tooling using Python (Boto3, Flask, FastAPI, etc.) .
Develop custom monitoring/alerting integrations with APIs, SDKs, and third-party observability platforms.
Implement self-healing and resilience-focused automation scripts.

Chaos Engineering & Resiliency:

Design and execute chaos experiments (fault injection, latency, outages, resource failures) to validate system resilience.
Use tools like Gremlin, Litmus, Chaos Mesh, or AWS Fault Injection Simulator .
Partner with SRE and development teams to define SLIs, SLOs, and error budgets .
Document learnings from chaos tests and improve incident response & recovery playbooks.

DevOps & Observability:

Build and maintain CI/CD pipelines for automated deployments (Jenkins, GitHub Actions, GitLab CI, AWS CodePipeline).
Integrate observability frameworks (Prometheus, Grafana, ELK/EFK, CloudWatch, Datadog) for monitoring and tracing.
Ensure proactive alerting and real-time visibility into system health.

Security & Compliance:

Apply AWS security best practices for IAM, networking, and data protection.
Ensure compliance with internal and external regulatory frameworks (SOC2, ISO, GDPR, etc.).

Required Skills & Qualifications

6–10 years of experience in Cloud, DevOps, or SRE roles.
Strong hands-on expertise in AWS Cloud (certifications preferred: AWS DevOps Engineer / Solutions Architect).
Advanced Python development skills for automation and tooling (Boto3 a must).
Experience designing and running chaos experiments (Gremlin, AWS FIS, Litmus, Chaos Mesh, or custom Python-based fault injection).
Solid knowledge of IaC (Terraform / CloudFormation) .
Proficiency in containers & orchestration (Docker, Kubernetes, EKS) .
Strong background in monitoring, observability, and incident management .
Familiarity with DevOps toolchain (CI/CD, Git, Jenkins, GitLab, CodePipeline) .
Good understanding of resilient architectures, reliability principles, and disaster recovery .

Preferred Skills

Knowledge of Go / Shell scripting in addition to Python.
Experience with chaos testing in production-like environments .
Exposure to multi-cloud or hybrid-cloud environments .
Strong problem-solving and debugging skills.

What We Offer

Opportunity to lead cloud reliability & chaos engineering initiatives .
Culture focused on automation, resilience, and continuous improvement .
Growth opportunities through certifications, R&D projects, and leadership roles.

Site Reliability Engineer

2 days ago

India Concord Full time

SRE Sr. Engineers (Individual Contributors) Key Attributes: - Strong SRE (Site Reliability Engineering) experience - DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. - Excellent troubleshooting and debugging skills (infrastructure + application level) - Perseverance – must push through complex/challenging issues without...
Site Reliability Engineer

2 weeks ago

India Employ Full time

Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering rolesLocation – Bangalore/ RemoteType - ContractWork Ex - 4-6 yrsWe're working with a AI product company that's building the next generation of GenAI powered developer platforms.We're looking for an experienced Site Reliability Engineer to join their Platform Engineering...
Site Reliability Engineer

2 weeks ago

India Employ Full time

Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Bangalore/ Remote Type - Contract Work Ex - 4-6 yrs We're working with a AI product company that's building the next generation of GenAI powered developer platforms . We're looking for an experienced Site Reliability Engineer to join their Platform...
Site Reliability Engineer

3 days ago

india Synechron Full time

We have immediate opportunity forSRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron –BangaloreJob Role: -SRE (Senior Site Reliability Engineer) Job Location: -Bangalore Notice Period:Within 30daysAbout Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+...
Site Reliability Engineer

3 days ago

India Employ Full time

Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Fully Remote Type - 6 months Contract Work Ex - 5+ Yrs We’re working with a AI product company that’s building the next generation of GenAI powered developer platforms . We’re looking for an experienced Site Reliability...
Site Reliability Engineer

2 days ago

India Concord Full time

SRE Sr. Engineers (Individual Contributors) Key Attributes : Strong SRE (Site Reliability Engineering) experience DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. Excellent troubleshooting and debugging skills (infrastructure + application level) Perseverance – must push through complex/challenging issues without giving up...
Site Reliability Engineer

2 weeks ago

India ValueLabs Full time ₹ 5,00,000 - ₹ 10,00,000 per year

Experienced in SRE or Site Reliability Engineer Design, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure DevOps. Collaborate with cross-functional teams to optimize system performance, reliability, and scalability. Develop and maintain tools for continuous integration, continuous deployment (CI/CD),...
Site Reliability Engineer

2 weeks ago

India Akamai Full time US$ 90,000 - US$ 1,20,000 per year

Do you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...
Site Reliability Engineer

2 hours ago

India Akamai Full time

Do you want to grow your career in Linux and Site Reliability Engineering? Would you like to contribute to the foundation of a new public cloud platform? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...
Site Reliability Engineer

3 days ago

India Concord Full time

SRE Sr. Engineers (Individual Contributors) Key Attributes : Strong SRE (Site Reliability Engineering) experience DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. Excellent troubleshooting and debugging skills (infrastructure + application level) Perseverance – must push through complex/challenging issues...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer