Principal Reliability Engineer
1 week ago
About T-Mobile T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience. TMUS Global Solutions TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking. TMUS India Private Limited is a subsidiary of T-Mobile US, Inc. and operates as TMUS Global Solutions. About the Role As a Principal SRE, you will be a key member of the CFL Platform Engineering and Operations team ,you will lead reliability engineering for AI-powered platforms supporting LLM applications, AI gateways, and enterprise-scale services across finance, credit, collections, and document systems. You will design and implement observability and incident response frameworks, scale high-performance infrastructure, and champion SRE best practices to support secure, automated, and resilient systems. What You’ll Do Architect observability and incident response pipelines for LLM, API, and backend systems Define SLAs, SLIs, alerts, and dashboards for latency, throughput, and availability Lead high-severity incident response, root cause analysis, and system recovery Collaborate with AI, Platform, and Security teams to enforce operational guardrails Implement automation-first strategies using GitLab CI/CD, Terraform, and deployment tooling Guide infrastructure tuning, capacity planning, and cost optimization Drive monitoring across hybrid clouds using Prometheus, Grafana, Splunk, Open Telemetry Support AIOps, model observability, policy enforcement, and audit readiness Mentor senior SREs and foster a high-ownership, technical excellence culture What You’ll Bring ~ Bachelor's or Master’s in Computer Science, Engineering, or related field ~7-12 years in SRE, infrastructure, or platform roles in distributed systems ~ Strong experience in incident management, AI/ML observability, and performance engineering ~ Hands-on expertise with OpenAI APIs, inference systems, AI gateways, and secure APIs ~ Proficiency in Python, Java, Bash/PowerShell, YAML ~ Deep knowledge of CI/CD workflows, GitLab pipelines, and SDLC processes ~ Experience with Kafka, HAProxy, RabbitMQ, Oracle DB, MongoDB ~ Proven success in scaling cloud-native platforms on Azure, AWS, GCP, or OCI ~ Familiarity with AIOps, latency scoring, policy validation, and secure AI operations ~ Background in compliance, governance, and enterprise risk management for AI systems ~ Advanced debugging skills across data, infrastructure, networking, and app layers ~ Leadership in chaos engineering, SLO-based operations, and system resilience Must Have Skills Application & Microservice: Java, Spring boot, API & Service Design Any CI/CD Tools : Gitlab Pipeline/Test Automation/GitHub Actions/ Jenkins /Circle CI App Platform: Docker & Containers (Kubernetes) Any Databases : SQL & NOSQL (Cassandra/Oracle/Snowflake/MongoDB) Any Messaging: Kafka, Rabbit MQ Any Observability/Monitoring: Splunk/ Grafana/ Open Telemetry /ELK Stack/ Datadog/ New Relic/ Prometheus) Incident/Change/Problem Management Nice To Have Compliance-aligned continuity planning (PCI, SOX) Error-budget pacts with product/org leadership Executive Incident/Change/Problem /risk reporting Observability cost vs coverage trade-offs Org-wide reliability governance strategy
-
Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India Oracle Financial Services Software Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per yearPrincipal Site Reliability Engineer Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and...
-
Principal Site Reliability Engineer
5 days ago
Hyderabad, Telangana, India Oracle Full time ₹ 20,00,000 - ₹ 60,00,000 per yearOracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...
-
Principal Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Cubic Transportation Systems Full time ₹ 12,00,000 - ₹ 36,00,000 per yearHiring Principal Site Reliability EngineerExperience: 12+ YearsLocation: HyderabadNotice: Immediate to 30 DaysWe're seeking an experiencedSite Reliability Engineer (SRE)to ensure our services are robust, scalable, secure, and maintainable. You will blend software engineering and systems operations to automate processes, monitor performance, lead incident...
-
Principal Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Cubic Transportation Full time ₹ 12,00,000 - ₹ 36,00,000 per yearHiring Principal Site Reliability EngineerExperience: 12 to 18 YearsLocation: HyderabadNotice Period: Immediate to 30 DaysKey ResponsibilitiesDesign, deploy, and maintain scalable, secure applications and infrastructure in cloud or hybrid environmentsImplement and manage robust monitoring, alerting, and observability systemsAutomate recurrent operational...
-
Principal Network Reliability Engineer
3 weeks ago
Hyderabad, India Oracle Full timeJob Description Job Description JOB DESCRIPTION The Oracle Cloud Infrastructure (OCI) delivers mission-critical applications for top tier enterprises around the world. Our cloud offers unmatched hyper-scale, multi-tenant services deployed in more than 40 regions worldwide. The mission of our Network Reliability Engineering team is to provide exceptional...
-
Principal Engineer, Site Reliability
4 weeks ago
Hyderabad, India ANSR Full timeANSR is hiring for one of its clients. About T-Mobile: T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America's supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional...
-
Head of reliability
3 weeks ago
Hyderabad, India ANSR Full timeTMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience. TMUS Global Solutions is a world-class technology powerhouse...
-
Principal Engineer, Site Reliability
4 weeks ago
Hyderabad, India ANSR Full timeANSR is hiring for one of its clients.About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional...
-
Principal Engineer, Site Reliability
4 weeks ago
Hyderabad, India ANSR Full timeANSR is hiring for one of its clients.About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional...
-
Principal engineer, site reliability
3 weeks ago
Hyderabad, India ANSR Full timeANSR is hiring for one of its clients.About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional...