Site Reliability Engineer

4 days ago


India Xebia Full time
We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency of critical systems.

Cloud Engineering (AWS):

Architect, implement, and manage secure, scalable, and cost-efficient AWS infrastructure (EC2, Lambda, EKS, S3, RDS, IAM, CloudFront, etc.).

Automate infrastructure provisioning and configuration using Terraform / CloudFormation and AWS SDKs.

Manage containerized workloads (Docker, Kubernetes, EKS).

Python Development:

Build automation scripts, deployment utilities, and infrastructure tooling using Python (Boto3, Flask, FastAPI, etc.) .

Develop custom monitoring/alerting integrations with APIs, SDKs, and third-party observability platforms.

Chaos Engineering & Resiliency:

Use tools like Gremlin, Litmus, Chaos Mesh, or AWS Fault Injection Simulator .

DevOps & Observability:

Build and maintain CI/CD pipelines for automated deployments (Jenkins, GitHub Actions, GitLab CI, AWS CodePipeline).

Integrate observability frameworks (Prometheus, Grafana, ELK/EFK, CloudWatch, Datadog) for monitoring and tracing.

Apply AWS security best practices for IAM, networking, and data protection.

Ensure compliance with internal and external regulatory frameworks (SOC2, ISO, GDPR, etc.).

6–10 years of experience in Cloud, DevOps, or SRE roles.

~ Strong hands-on expertise in AWS Cloud (certifications preferred: AWS DevOps Engineer / Solutions Architect).

~ Advanced Python development skills for automation and tooling (Boto3 a must).

~ Experience designing and running chaos experiments (Gremlin, AWS FIS, Litmus, Chaos Mesh, or custom Python-based fault injection).

~ Proficiency in containers & orchestration (Docker, Kubernetes, EKS) .

~ Strong background in monitoring, observability, and incident management .

~ Familiarity with DevOps toolchain (CI/CD, Git, Jenkins, GitLab, CodePipeline) .

~ Knowledge of Go / Shell scripting in addition to Python.

Experience with chaos testing in production-like environments .

Exposure to multi-cloud or hybrid-cloud environments .

Opportunity to lead cloud reliability & chaos engineering initiatives .

Growth opportunities through certifications, R&D projects, and leadership roles.

  • India Employ Full time

    Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering rolesLocation – Bangalore/ RemoteType - ContractWork Ex - 4-6 yrsWe're working with a AI product company that's building the next generation of GenAI powered developer platforms.We're looking for an experienced Site Reliability Engineer to join their Platform Engineering...


  • India Employ Full time

    Role - Site Reliability Engineer (SRE)/ Platform Engineering/ or DevOps Engineering roles Location – Bangalore/ Remote Type - Contract Work Ex - 4-6 yrs We're working with a AI product company that's building the next generation of GenAI powered developer platforms . We're looking for an experienced Site Reliability Engineer to join their Platform...


  • India ValueLabs Full time ₹ 5,00,000 - ₹ 10,00,000 per year

    Experienced in SRE or Site Reliability Engineer Design, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure DevOps. Collaborate with cross-functional teams to optimize system performance, reliability, and scalability. Develop and maintain tools for continuous integration, continuous deployment (CI/CD),...


  • India Akamai Full time US$ 90,000 - US$ 1,20,000 per year

    Do you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...


  • India Zensar Technologies Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Candidate having skilled and proactive Site Reliability Engineer (SRE) with 10 Years experienceThe SRE will be responsible for ensuring the reliability, scalability, and performance of our systems and infrastructure.This role blends software engineering with IT operations to build fault-tolerant, self-healing systems and drive continuous improvement across...


  • India beBeeReliability Full time ₹ 20,00,000 - ₹ 25,00,000

    We are seeking a seasoned Site Reliability Engineer to join our team. This role is focused on leading the operational health of our platforms, ensuring they deliver highly reliable financial applications and data services. This critical position will play a pivotal role in ensuring the stability, scalability, and operational excellence of Accounting and...


  • India Rackspace Full time US$ 1,25,000 - US$ 1,75,000 per year

    Site Reliability Engineer / Observability Engineer Public Cloud - Offerings and Delivery – Workforce Mgmt & Delivery Ops / Full - Time / Remote Rackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites. If you enjoy solving complex business problems and can contribute to building next generation of...


  • India New Era Technology Full time US$ 90,000 - US$ 1,20,000 per year

    Join New Era Technology, where People First is at the heart of everything we do. With a global team of over 4,500 professionals, we're committed to creating a workplace where everyone feels valued, empowered, and inspired to grow. Our mission is to securely connect people, places, and information with end-to-end technology solutions at scale. At New...


  • India Synechron Full time

    Good-day,We have immediate opportunity for Senior Site Reliability Engineer.Job Role:Senior Site Reliability EngineerJob Location: Synechron( Bengaluru/ Pune)Experience-8 to 15 yearsNotice : Immediate JoinerAbout Company:At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and...


  • India CES Full time

    We're looking for a highly skilled Site Reliability Engineer to help us build, manage, and scale modern infrastructure systems for high-availability applications. If you're passionate about automation, cloud platforms, and solving tough operational challenges, we would love to hear from you.Key Skills and Competencies- 3+ years of extensive experience with...