Site Reliability Engineer III

6 days ago


Hyderabad, Telangana, India Sonata Software Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Role & responsibilities

Category

Details

Role

Site Reliability Engineer (SRE) III Data Engineering

Location

Hyderabad- Hybrid

Employment Type

Full Time

Experience

7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U)

Primary Skills (Must-Have)

AWS, CI/CD, Jenkins, IAAC, Terraform, Kubernetes

Secondary Skills (Good-to-Have)

AWS systems; Dataiku data, Platform updates and patching

Tools & Platforms

Data Warehousing & Processing: Snowflake, Redshift, Apache Airflow, dbt

CI/CD & Deployment: Jenkins, GitHub Actions, AWS CodePipeline, Terraform

Cloud & Event Processing: AWS Lambda, API Gateway, SNS/SQS, Kafka, Step Functions

Monitoring & Logging: DataDog, AWS CloudWatch, Prometheus, Splunk

Incident Management: PagerDuty, Opsgenie, AWS Health Dashboard

Collaboration & Code Review: GitHub, Jira, Confluence

Key Responsibilities

Data Pipeline Reliability & Observability:

  • Maintain and optimize highly available, fault-tolerant infrastructure for data pipelines, ETL jobs, and real-time data processing

  • Implement end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows

  • Automate data pipeline health checks, error handling, and auto-remediation strategies

Infrastructure & Cloud Automation:

  • Deploy and manage AWS-based data infrastructure using Terraform and CloudFormation

  • Optimize Kubernetes (EKS) clusters for processing large-scale datasets and real-time analytics

  • Ensure high availability and cost-efficient scaling for Redshift, Snowflake, and data storage solutions

Performance, Monitoring & Incident Response:

  • Implement real-time monitoring, logging, and alerting using DataDog, AWS CloudWatch, and Prometheus

  • Define and track SLOs, SLIs, and error budgets to improve data reliability and uptime

  • Conduct Root Cause Analysis (RCA), security audits, and post-mortems for incidents

Security & Compliance:

  • Ensure GDPR, CCPA, and SOC 2 compliance for data storage, access controls, and retention policies

  • Implement AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryption

  • Secure API gateways, authentication mechanisms, and data lake permissions to prevent unauthorized access

Collaboration & Leadership:

  • Work closely with data engineers, analytics teams, and DevOps engineers to enhance data platform reliability

  • Participate in incident response drills, disaster recovery planning, and security compliance reviews

  • Advocate for best practices in automation, cost optimization, and cloud-native data solutions



  • Hyderabad, Telangana, India Chase Bank Full time

    Job DescriptionThere's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Consumer and Community Banking, youwill solve complex and broad...


  • Hyderabad, Telangana, India JPMorgan Chase Full time

    Job Category Software Engineering There s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world s most complex and mission-critical systems As a Site Reliability Engineer III at JPMorgan Chase within the Consumer Community Banking you will solve...


  • Hyderabad, Telangana, India JP Morgan Chase & Co. Full time

    Job DescriptionThere's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Consumer & Community Banking, youwill solve complex and broad...


  • Hyderabad, Telangana, India Jigya Software Services Full time ₹ 1,50,000 - ₹ 28,00,000 per year

    Job Title:Senior Site Reliability Engineer (SRE) - AWS/KubernetesLocation:Hyderabad - OnsiteJob Type:Full-TimeAbout the Role:We are looking for a highly skilled and motivated Site Reliability Engineer to design, build, and maintain our high-performance, scalable cloud infrastructure. You will play a critical role in ensuring the reliability, performance, and...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:- Strong leadership and people management skills.- Exceptional technical proficiency in Pearson's technology stack.- Advanced project management capabilities.- Excellent communication and collaboration skills.- Adept at risk assessment and...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability Engineer Experience: 7+ Years Location: Hyderabad Hybrid 4-day office and 1 Day remote Skills for Principal: Strong leadership and people management skills. Exceptional technical proficiency in Pearson's technology stack. Advanced project management capabilities. Excellent communication and collaboration skills. Adept at risk assessment...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...


  • Hyderabad, Telangana, India ServiceNow Full time

    Site Reliability Engineer (SRE)Experience : 6+ YearsAbout the Role : We are seeking a seasoned SRE to ensure the reliability, availability, and performance of our critical services. You will combine software engineering with systems administration to create scalable and highly reliable software systems.Responsibilities : - Design, build, and maintain...


  • Hyderabad, Telangana, India INDIGLOBE IT SOLUTIONS PRIVATE LIMITED Full time

    Job Summary :We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). Youll be responsible for owning application support, maintaining our microservices...


  • Hyderabad, Telangana, India JP Morgan Chase & Co. Full time

    Job DescriptionAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking Team, you will take the lead in conducting resiliency design reviews, break...