
Platform Reliability Engineer
2 weeks ago
ANSR is hiring for one of its clients.
About T-Mobile:
T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.
About TMUS Global Solutions:
TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.
TMUS India Private Limited is a subsidiary of T-Mobile US, Inc. and operates as TMUS Global Solutions.
About the Role:
As a Site Reliability Engineer (SRE), you will be a key member of the CFL Platform Engineering and Operations team you will be responsible for building and maintaining large-scale, distributed systems that are observable, scalable, and resilient. This role sits at the intersection of software engineering and infrastructure operations, ensuring high availability and performance of production systems through automation, monitoring, and proactive engineering. You'll work closely with development, DevOps, and cloud platform teams to improve deployment strategies, incident response, and system health insights. This is a hands-on role for engineers who are passionate about operational excellence, reducing toil, and improving system reliability through code.
What You Will Do:
- Ensure high availability and performance of production platforms through monitoring, alerting, and incident management
- Design and implement resiliency patterns such as circuit breakers, failovers, retries, and health checks
- Develop automation to reduce manual operational work and improve system efficiency
- Support CI/CD workflows and infrastructure automation using tools like Terraform and Helm
- Collaborate with developers to enhance service deployment and rollback mechanisms
- Build and maintain observability tooling including dashboards, logs, and metrics
- Analyze performance data and use it to guide optimizations and issue detection
- Participate in on-call rotations, incident triage, and post-incident analysis
- Write and maintain operational documentation, including runbooks and playbooks
- Support development teams in achieving service-level objectives (SLOs) and operational readiness
What You Will Bring:
- Bachelor’s degree in Computer Science, Engineering, or a related technical field
- 2-5 years of experience in SRE, infrastructure, DevOps, or related engineering roles
- Proficiency in scripting or programming (Python, Go, or Bash preferred)
- Strong experience with Linux systems and cloud environments (Azure preferred;
AWS/GCP also relevant) - Hands-on experience with Kubernetes and containerized services
- Familiarity with observability tools such as Prometheus, Grafana, Splunk, or OpenTelemetry
- Exposure to incident response frameworks, postmortems, and error budgets
- Understanding of core SRE concepts: SLOs, SLIs, and service reliability metrics
- Experience with CI/CD tools (e.G., GitLab CI/CD, Jenkins, Spinnaker)
- Working knowledge of infrastructure tools such as HAProxy, RabbitMQ, or similar
- Strong analytical and troubleshooting skills for distributed systems
- Clear communication skills and ability to work cross-functionally
- A continuous improvement mindset focused on reducing operational toil and enhancing developer experience
Must Have Skills:
- Application & Microservice: Java, Spring boot, API & Service Design
- Any CI/CD Tools : Gitlab Pipeline/Test Automation/GitHub Actions/ Jenkins /Circle CI
- App Platform: Docker & Containers (Kubernetes)
- Any Databases : SQL & NOSQL (Cassandra/Oracle/Snowflake/MongoDB)
- Any Messaging: Kafka, Rabbit MQ
- Any Observability/Monitoring: Splunk/ Grafana/ Open Telemetry /ELK Stack/ Datadog/ New Relic/ Prometheus)
- Incident/Change/Problem Management
Nice To Have:
- Define SLIs/SLOs
-
Platform Reliability Engineer
6 days ago
Bengaluru, Chennai, Hyderabad, India ti Steps Full time ₹ 15,00,000 - ₹ 25,00,000 per yearAbout the Role:We are seeking a highly motivated and experienced Platform Reliability Engineer (PRE) to ensure the performance, reliability, and scalability of our core platform infrastructure. In this role, you will work at the intersection of software engineering and systems engineering to build resilient systems, automate operational processes, and drive...
-
Lead Platform Engineer
2 weeks ago
Hyderabad, India Prometheus consulting Full timeDescription :What You Will Own : - Build, manage, and mentor a high-performing Platform Engineering team, fostering a culture of collaboration, accountability, and continuous development.- Ensure timely and efficient delivery of the teams project and reactive work to support the needs of the business in alignment with Product and Technology Roadmaps- Provide...
-
AI Platform – Cloud/Site Reliability Engineer
2 hours ago
Hyderabad, Telangana, India Amgen Full time ₹ 8,00,000 - ₹ 20,00,000 per yearRole Description:We are looking for a Cloud/Site Reliability Engineer (SRE) to join our AI Platform team, focused on building and maintaining highly available, scalable, and secure infrastructure for AI/ML workloads. This role is critical to ensure the reliability and performance of our AI services and platform components across cloud environments.You will...
-
Lead Site Reliability Engineer
3 weeks ago
Hyderabad, India VREZOLV PARTNERS PRIVATE LIMITED Full timeLead Site Reliability Engineer (ServiceNow Platform) What you get to do in this role: As the Lead Site Reliability Engineer (SRE) , you will spearhead the design and implementation of observability and reliability strategies across our ServiceNow platform and integrated third-party systems. You'll lead the charge in establishing and maturing...
-
Lead Site Reliability Engineer
3 weeks ago
Hyderabad, India VREZOLV PARTNERS PRIVATE LIMITED Full timeLead Site Reliability Engineer (ServiceNow Platform) What you get to do in this role: As the Lead Site Reliability Engineer (SRE) , you will spearhead the design and implementation of observability and reliability strategies across our ServiceNow platform and integrated third-party systems. You'll lead the charge in establishing and...
-
Site Reliability Engineer
22 hours ago
Hyderabad, India INDIGLOBE IT SOLUTIONS PRIVATE LIMITED Full timeJob Summary :We are looking for a Senior Site Reliability Engineer (SRE) to join our growing Engineering team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our production systems across a multi-cloud environment (GCP & AWS). Youll be responsible for owning application support, maintaining our microservices...
-
Reliability Engineer
2 weeks ago
Hyderabad, India ANSR Full timeAbout T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.About TMUS Global...
-
Site Reliability Engineer
4 days ago
Hyderabad, India Sonata Software Full timeCategory Details Role Site Reliability Engineer (SRE) III – Data Engineering Location Hyderabad- Employment Type Full Time Experience 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI/CD, Jenkins, IAAC,...
-
Site Reliability Engineer
4 days ago
Hyderabad, India Sonata Software Full timeCategory Details Role Site Reliability Engineer (SRE) III – Data Engineering Location Hyderabad- Employment Type Full Time Experience 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI/CD, Jenkins, IAAC,...
-
Reliability Engineer
5 days ago
Hyderabad, Telangana, India Apple Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAre you meticulously organized and highly observant? Join our Information Systems and Technology group and play a vital function on one of two Apple teams: Software and Services and Corporate Functions. From Apple ID to the Apple website to our data centers around the globe, our diverse collection of engineers, designers and creators manage the massive...