
Site Reliability Engineer
3 days ago
We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency of critical systems.
Key Responsibilities
Cloud Engineering (AWS):
- Architect, implement, and manage secure, scalable, and cost-efficient AWS infrastructure (EC2, Lambda, EKS, S3, RDS, IAM, CloudFront, etc.).
- Automate infrastructure provisioning and configuration using Terraform / CloudFormation and AWS SDKs.
- Manage containerized workloads (Docker, Kubernetes, EKS).
Python Development:
- Build automation scripts, deployment utilities, and infrastructure tooling using Python (Boto3, Flask, FastAPI, etc.) .
- Develop custom monitoring/alerting integrations with APIs, SDKs, and third-party observability platforms.
- Implement self-healing and resilience-focused automation scripts.
Chaos Engineering & Resiliency:
- Design and execute chaos experiments (fault injection, latency, outages, resource failures) to validate system resilience.
- Use tools like Gremlin, Litmus, Chaos Mesh, or AWS Fault Injection Simulator .
- Partner with SRE and development teams to define SLIs, SLOs, and error budgets .
- Document learnings from chaos tests and improve incident response & recovery playbooks.
DevOps & Observability:
- Build and maintain CI/CD pipelines for automated deployments (Jenkins, GitHub Actions, GitLab CI, AWS CodePipeline).
- Integrate observability frameworks (Prometheus, Grafana, ELK/EFK, CloudWatch, Datadog) for monitoring and tracing.
- Ensure proactive alerting and real-time visibility into system health.
Security & Compliance:
- Apply AWS security best practices for IAM, networking, and data protection.
- Ensure compliance with internal and external regulatory frameworks (SOC2, ISO, GDPR, etc.).
Required Skills & Qualifications
- 6–10 years of experience in Cloud, DevOps, or SRE roles.
- Strong hands-on expertise in AWS Cloud (certifications preferred: AWS DevOps Engineer / Solutions Architect).
- Advanced Python development skills for automation and tooling (Boto3 a must).
- Experience designing and running chaos experiments (Gremlin, AWS FIS, Litmus, Chaos Mesh, or custom Python-based fault injection).
- Solid knowledge of IaC (Terraform / CloudFormation) .
- Proficiency in containers & orchestration (Docker, Kubernetes, EKS) .
- Strong background in monitoring, observability, and incident management .
- Familiarity with DevOps toolchain (CI/CD, Git, Jenkins, GitLab, CodePipeline) .
- Good understanding of resilient architectures, reliability principles, and disaster recovery .
Preferred Skills
- Knowledge of Go / Shell scripting in addition to Python.
- Experience with chaos testing in production-like environments .
- Exposure to multi-cloud or hybrid-cloud environments .
- Strong problem-solving and debugging skills.
What We Offer
- Opportunity to lead cloud reliability & chaos engineering initiatives .
- Culture focused on automation, resilience, and continuous improvement .
- Growth opportunities through certifications, R&D projects, and leadership roles.
-
Site Reliability Engineer
2 days ago
India Concord Full timeSRE Sr. Engineers (Individual Contributors) Key Attributes: - Strong SRE (Site Reliability Engineering) experience - DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. - Excellent troubleshooting and debugging skills (infrastructure + application level) - Perseverance – must push through complex/challenging issues without...
-
Site Reliability Engineer
2 weeks ago
India Employ Full timeRole - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering rolesLocation – Bangalore/ RemoteType - ContractWork Ex - 4-6 yrsWe're working with a AI product company that's building the next generation of GenAI powered developer platforms.We're looking for an experienced Site Reliability Engineer to join their Platform Engineering...
-
Site Reliability Engineer
2 weeks ago
India Employ Full timeRole - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Bangalore/ Remote Type - Contract Work Ex - 4-6 yrs We're working with a AI product company that's building the next generation of GenAI powered developer platforms . We're looking for an experienced Site Reliability Engineer to join their Platform...
-
Site Reliability Engineer
3 days ago
india Synechron Full timeWe have immediate opportunity forSRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron –BangaloreJob Role: -SRE (Senior Site Reliability Engineer) Job Location: -Bangalore Notice Period:Within 30daysAbout Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+...
-
Site Reliability Engineer
3 days ago
India Employ Full timeRole - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Fully Remote Type - 6 months Contract Work Ex - 5+ Yrs We’re working with a AI product company that’s building the next generation of GenAI powered developer platforms . We’re looking for an experienced Site Reliability...
-
Site Reliability Engineer
2 days ago
India Concord Full timeSRE Sr. Engineers (Individual Contributors) Key Attributes : Strong SRE (Site Reliability Engineering) experience DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. Excellent troubleshooting and debugging skills (infrastructure + application level) Perseverance – must push through complex/challenging issues without giving up...
-
Site Reliability Engineer
2 weeks ago
India ValueLabs Full time ₹ 5,00,000 - ₹ 10,00,000 per yearExperienced in SRE or Site Reliability Engineer Design, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure DevOps. Collaborate with cross-functional teams to optimize system performance, reliability, and scalability. Develop and maintain tools for continuous integration, continuous deployment (CI/CD),...
-
Site Reliability Engineer
2 weeks ago
India Akamai Full time US$ 90,000 - US$ 1,20,000 per yearDo you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...
-
Site Reliability Engineer
2 hours ago
India Akamai Full timeDo you want to grow your career in Linux and Site Reliability Engineering? Would you like to contribute to the foundation of a new public cloud platform? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...
-
Site Reliability Engineer
3 days ago
India Concord Full timeSRE Sr. Engineers (Individual Contributors) Key Attributes : Strong SRE (Site Reliability Engineering) experience DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc. Excellent troubleshooting and debugging skills (infrastructure + application level) Perseverance – must push through complex/challenging issues...