Engineer, Site Reliability

3 weeks ago


Hyderabad, India TMUS Global Solutions Full time

About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.About TMUS Global Solutions:TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.TMUS India Private Limited operates as TMUS Global Solutions.About the Role:As a Site Reliability Engineer (SRE), you will be a key member of the CFL Platform Engineering and Operations team you will be responsible for building and maintaining large-scale, distributed systems that are observable, scalable, and resilient. This role sits at the intersection of software engineering and infrastructure operations, ensuring high availability and performance of production systems through automation, monitoring, and proactive engineering. You'll work closely with development, DevOps, and cloud platform teams to improve deployment strategies, incident response, and system health insights. This is a hands-on role for engineers who are passionate about operational excellence, reducing toil, and improving system reliability through code.What You Will Do:- Ensure high availability and performance of production platforms through monitoring, alerting, and incident management- Design and implement resiliency patterns such as circuit breakers, failovers, retries, and health checks- Develop automation to reduce manual operational work and improve system efficiency- Support CI/CD workflows and infrastructure automation using tools like Terraform and Helm- Collaborate with developers to enhance service deployment and rollback mechanisms- Build and maintain observability tooling including dashboards, logs, and metrics- Analyze performance data and use it to guide optimizations and issue detection- Participate in on-call rotations, incident triage, and post-incident analysis- Write and maintain operational documentation, including runbooks and playbooks- Support development teams in achieving service-level objectives (SLOs) and operational readinessWhat You Will Bring:- Bachelor’s degree in Computer Science, Engineering, or a related technical field- 2-5 years of experience in SRE, infrastructure, DevOps, or related engineering roles- Proficiency in scripting or programming (Python, Go, or Bash preferred)- Strong experience with Linux systems and cloud environments (Azure preferred; AWS/GCP also relevant)- Hands-on experience with Kubernetes and containerized services- Familiarity with observability tools such as Prometheus, Grafana, Splunk, or OpenTelemetry- Exposure to incident response frameworks, postmortems, and error budgets- Understanding of core SRE concepts: SLOs, SLIs, and service reliability metrics- Experience with CI/CD tools (e.g., GitLab CI/CD, Jenkins, Spinnaker)- Working knowledge of infrastructure tools such as HAProxy, RabbitMQ, or similar- Strong analytical and troubleshooting skills for distributed systems- Clear communication skills and ability to work cross-functionally- A continuous improvement mindset focused on reducing operational toil and enhancing developer experienceMust Have Skills:- Application & Microservice: Java, Spring boot, API & Service Design- Any CI/CD Tools : Gitlab Pipeline/Test Automation/GitHub Actions/ Jenkins /Circle CI- App Platform: Docker & Containers (Kubernetes)- Any Databases : SQL & NOSQL (Cassandra/Oracle/Snowflake/MongoDB)- Any Messaging: Kafka, Rabbit MQ- Any Observability/Monitoring: Splunk/ Grafana/ Open Telemetry /ELK Stack/ Datadog/ New Relic/ Prometheus)- Incident/Change/Problem ManagementNice To Have:- Define SLIs/SLOs



  • Hyderabad, India Talent Worx Full time

    Site Reliability Engineer (SRE) At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...


  • Hyderabad, Telangana, India Jigya Software Services Full time ₹ 1,50,000 - ₹ 28,00,000 per year

    Job Title:Senior Site Reliability Engineer (SRE) - AWS/KubernetesLocation:Hyderabad - OnsiteJob Type:Full-TimeAbout the Role:We are looking for a highly skilled and motivated Site Reliability Engineer to design, build, and maintain our high-performance, scalable cloud infrastructure. You will play a critical role in ensuring the reliability, performance, and...


  • Hyderabad, India Sonata Software Full time

    Hello Connetions Greetings of the day!!! We have immediate openings for SRE Role - Site Reliability Engineer Experience - 7 to 12yrs Work Location -Hyderabad Notice Period -immediate Interested candidates can share your CVs to -


  • Hyderabad, India Sonata Software Full time

    Hello Connetions Greetings of the day!!! We have immediate openings for SRE Role - Site Reliability Engineer Experience - 7 to 12yrs Work Location -Hyderabad Notice Period -immediate Interested candidates can share your CVs to -


  • Hyderabad, India Talentiser Full time

    Hiring hybrid Site Reliability Engineers for a fast-growing product company building scalable tech solutions and transforming how businesses run mission-critical operations. Our Saa S platform is designed for high performance, reliability, and automation at scale. Your Impact As a Site Reliability Engineer , you’ll play a key role in ensuring ...


  • Hyderabad, India Sonata Software Full time

    Category Details Role Site Reliability Engineer (SRE) III – Data Engineering Location Hyderabad- Employment Type Full Time Experience 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI/CD, Jenkins, IAAC,...


  • Hyderabad, India Sonata Software Full time

    Category Details Role Site Reliability Engineer (SRE) III – Data Engineering Location Hyderabad- Employment Type Full Time Experience 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI/CD, Jenkins, IAAC,...


  • hyderabad, India Sonata Software Full time

    Category Details Role Site Reliability Engineer (SRE) III – Data Engineering Location Hyderabad- Employment Type Full Time Experience 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI/CD, Jenkins, IAAC,...


  • Hyderabad, India Sonata Software Full time

    Role: Site Reliability Engineer Location: HyderabadNotice Period: Immediate to 20 DaysEmployment Type: Full TimeExperience7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U)Primary Skills (Must-Have)AWS, CI/CD, Jenkins, IAAC,...


  • Hyderabad, India Pythian Full time

    Site Reliability Engineer HyderabadSite Reliability Engineering – Site Reliability Engineering /Full Time /HybridSite Reliability Engineer Hyderabad-based | Multiple timezones available | Hybrid | Work from Home and the OfficeWhy Pythian: At Pythian, we are experts in strategic database and analytics services, driving digital transformation and...