Senior Site Reliability Engineer

59 minutes ago


Hyderabad Telangana India, Telangana Elios Talent Full time

Senior Site Reliability Engineer


Key Highlights

️ Build, scale, and optimize cloud-native infrastructure powering global, high-availability platforms

⚡ Drive automation-first engineering across AWS, Terraform, CI/CD, observability, and resilient systems

Own reliability, uptime, system health, costs, and performance across mission-critical environments

Strengthen DevSecOps practices—improving security, delivery velocity, and operational excellence

Lead major incident response, troubleshoot complex issues, and uphold production stability at scale


Position Overview

We are seeking a Senior Site Reliability Engineer to drive reliability, automation, and performance for large-scale, cloud-based platforms. This role blends deep technical engineering, systems thinking, DevOps collaboration, and operational leadership.


You will design and implement scalable infrastructure, improve observability, enhance resiliency, manage incident operations, and champion modern DevSecOps practices. This role plays a critical part in supporting tens of thousands of daily users while ensuring platforms remain secure, fast, and highly available.


Key Responsibilities

Cloud Engineering

  • Architect, deploy, and optimize AWS environments using automation and Infrastructure-as-Code
  • Build tooling that increases predictability, stability, and delivery speed
  • Optimize systems for scale, reliability, cost, and performance
  • Maintain repeatable, traceable, and transparent infrastructure through Terraform and automation
  • Monitor cloud spend and usage, ensuring alignment with service-level objectives

Observability & Reliability

  • Own uptime, reliability, system security, performance metrics, and golden signals
  • Lead incident management and triage bridges during major events
  • Enhance telemetry systems (NewRelic, CloudWatch, DataDog) for deep operational visibility
  • Use data-driven analysis to improve system stability and customer experience
  • Ensure architecture and deployment patterns meet SLAs and reliability goals

DevSecOps & Automation

  • Strengthen CI/CD pipelines, code-review practices, and engineering standards
  • Partner with Cybersecurity to address vulnerabilities through automation
  • Support secure, consistent, and scalable delivery workflows across engineering teams

Resiliency Engineering

  • Identify failure points, blast-radius risks, and architectural gaps
  • Run failure-injection / chaos testing to validate resiliency
  • Forecast traffic, plan for seasonal peaks, and scale systems for 2x+ load scenarios
  • Drive improvements to infrastructure and software to meet resiliency targets

Leadership & Collaboration

  • Mentor engineers across levels, promoting high-quality engineering practices
  • Collaborate daily with product, engineering, and security teams in a DevOps model
  • Document, uplift, and share knowledge through cross-team forums and best practices

Qualifications

  • Experience as a software engineer with strong debugging + deployment skills
  • Hands-on expertise with AWS and Terraform (required)
  • Experience with ECS, and Kubernetes/EKS experience strongly preferred
  • Strong proficiency in Python, Golang, Bash, and automation frameworks
  • CI/CD experience with Jenkins, GitHub Enterprise, CircleCI, or similar
  • Ability to troubleshoot across web servers, app servers, OS, networks, storage, and databases
  • Experience running large-scale, high-availability production systems
  • Strong communication, root-cause analysis, and incident leadership skills
  • BS in Computer Science or equivalent industry experience


About Us

We build scalable, secure, and high-performing digital platforms that power global user experiences. By combining cloud engineering, automation, observability, and resilient systems design, we help organizations operate more reliably, innovate faster, and support long-term platform stability and growth.


Why Join Us

Join a forward-thinking engineering organization where reliability, automation, and performance are core values. You’ll work with a modern cloud stack, collaborate with exceptional engineers, and own meaningful technical impact across large-scale applications. This is an opportunity to shape infrastructure strategy, elevate engineering practices, and build systems that support millions with consistency and excellence.



  • Hyderabad, Telangana, India Jade Global Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    Senior Site Reliability Engineer (SRE) – Datadog Observability1Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an...

  • Site Reliability Engineer

    58 minutes ago


    Hyderabad, Telangana, India, Telangana Elios Talent Full time

    Site Reliability EngineerKey Highlights️ Build, automate, and support cloud-native infrastructure powering high-availability platforms⚡ Contribute to automation-first engineering across AWS, Terraform, CI/CD, and observability tooling Improve reliability, uptime, system health, and performance across production environments Strengthen DevSecOps...


  • Hyderabad, Telangana, India, Telangana inTune Systems Inc Full time

    Job Summary: We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). You’ll be responsible for owning application support, maintaining our...


  • Hyderabad, Telangana, India Jade Global Full time ₹ 1,00,00,000 - ₹ 2,00,00,000 per year

    Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog Location: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE...


  • Hyderabad, Telangana, India Instaresz Business Services Pvt Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Job Title: Senior Site Reliability Engineer (SRE)Experience Required:10+ YearsLocation:Hyderabad (On-site)Employment Type:Full-TimeAbout InstareszInstaresz Business Services Pvt. Ltd. focuses on building and scalinghigh-performance SaaSproductswith expertise in:• SaaS Product Development• Infrastructure & DevOps• Data & Analytics• AI & AutomationOur...


  • Hyderabad, Telangana, India 2a1d0a41-1875-4bbb-b5a8-e4d5620cfd5f Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Role & responsibilitiesCoordinates cross-product chaos experimentation to proactively test system resilience and uncover reliability gaps.Maintains the centralized incident response playbook for the subdivision, documenting standards for communication, escalation, and recovery during incidents. Aggregates and reports quantifiable availability data to senior...


  • Hyderabad, Telangana, India, Telangana VXI Global Solutions Full time

    We are looking for a Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, along with exposure to SolarWinds. You should be comfortable working with metrics, logs, and...


  • Hyderabad, Telangana, India JPMorganChase Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    JOB DESCRIPTIONGuide and shape the future of technology at a globally recognized firm, driven by pride in ownership.As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the Consumer & Community Banking, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team's...


  • Hyderabad, Telangana, India Talent Worx Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Site Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...


  • Hyderabad, Telangana, India Oracle Full time ₹ 15,00,000 - ₹ 30,00,000 per year

    We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles, ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and implement solutions...