
Principal Engineer, Site Reliability 3 Days Left
3 days ago
About TMUS Global Solutions T-Mobile is Americas supercharged Un-carrier challenging conventions and setting new standards in wireless With the nations largest and fastest 5G network T-Mobile delivers advanced connectivity and unmatched value to millions across the U S Were unwaveringly obsessed with providing the best possible service experience driven by a spirit of disruption that fuels competition and innovation in wireless and beyond TMUS India Private Limited is a subsidiary of T-Mobile US Inc and operates as TMUS Global Solutions About the Role As a Principal SRE you will be a key member of the CFL Platform Engineering and Operations team you will lead reliability engineering for AI-powered platforms supporting LLM applications AI gateways and enterprise-scale services across finance credit collections and document systems You will design and implement observability and incident response frameworks scale high-performance infrastructure and champion SRE best practices to support secure automated and resilient systems What Youll Do Architect observability and incident response pipelines for LLM API and backend systems Define SLAs SLIs alerts and dashboards for latency throughput and availability Lead high-severity incident response root cause analysis and system recovery Collaborate with AI Platform and Security teams to enforce operational guardrails Implement automation-first strategies using GitLab CI CD Terraform and deployment tooling Guide infrastructure tuning capacity planning and cost optimization Drive monitoring across hybrid clouds using Prometheus Grafana Splunk OpenTelemetry Support AIOps model observability policy enforcement and audit readiness Mentor senior SREs and foster a high-ownership technical excellence culture What Youll Bring Bachelor s or Masters in Computer Science Engineering or related field 7-12 years in SRE infrastructure or platform roles in distributed systems Strong experience in incident management AI ML observability and performance engineering Hands-on expertise with OpenAI APIs inference systems AI gateways and secure APIs Proficiency in Python Java Bash PowerShell YAML Deep knowledge of CI CD workflows GitLab pipelines and SDLC processes Experience with Kafka HAProxy RabbitMQ Oracle DB MongoDB Proven success in scaling cloud-native platforms on Azure AWS GCP or OCI Familiarity with AIOps latency scoring policy validation and secure AI operations Background in compliance governance and enterprise risk management for AI systems Advanced debugging skills across data infrastructure networking and app layers Leadership in chaos engineering SLO-based operations and system resilience Must Have Skills Application Microservice Java Spring boot API Service Design Any CI CD Tools Gitlab Pipeline Test Automation GitHub Actions Jenkins Circle CI App Platform Docker Containers Kubernetes Any Databases SQL NOSQL Cassandra Oracle Snowflake MongoDB Any Messaging Kafka Rabbit MQ Any Observability Monitoring Splunk Grafana Open Telemetry ELK Stack Datadog New Relic Prometheus Incident Change Problem Management Nice To Have Compliance-aligned continuity planning PCI SOX Error-budget pacts with product org leadership Executive Incident Change Problem risk reporting Observability cost vs coverage trade-offs Org-wide reliability governance strategy
-
3 Days Left: Principal Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Cubic Transportation Systems Full timeHiring Principal Site Reliability EngineerExperience: 12+ YearsLocation: HyderabadNotice: Immediate to 30 DaysWe're seeking an experienced Site Reliability Engineer (SRE) to ensure our services are robust, scalable, secure, and maintainable. You will blend software engineering and systems operations to automate processes, monitor performance, lead incident...
-
3 Days Left: Site Reliability Engineer III
2 weeks ago
Hyderabad, Telangana, India JP Morgan Chase & Co. Full timeJob DescriptionThere's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Consumer & Community Banking, youwill solve complex and broad...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:- Strong leadership and people management skills.- Exceptional technical proficiency in Pearson's technology stack.- Advanced project management capabilities.- Excellent communication and collaboration skills.- Adept at risk assessment and...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability Engineer Experience: 7+ Years Location: Hyderabad Hybrid 4-day office and 1 Day remote Skills for Principal: Strong leadership and people management skills. Exceptional technical proficiency in Pearson's technology stack. Advanced project management capabilities. Excellent communication and collaboration skills. Adept at risk assessment...
-
Site Reliability Engineer
7 days ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...
-
Principal Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Cubic Transportation Systems Full timeHiring Principal Site Reliability Engineer Experience: 12+ Years Location: Hyderabad Notice: Immediate to 30 Days We're seeking an experienced Site Reliability Engineer (SRE) to ensure our services are robust, scalable, secure, and maintainable. You will blend software engineering and systems operations to automate processes, monitor performance, lead...
-
Principal Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Cubic Transportation Systems Full time ₹ 15,00,000 - ₹ 20,00,000 per yearHiring Principal Site Reliability EngineerExperience: 12+ YearsLocation: HyderabadNotice: Immediate to 30 DaysWe're seeking an experiencedSite Reliability Engineer (SRE)to ensure our services are robust, scalable, secure, and maintainable. You will blend software engineering and systems operations to automate processes, monitor performance, lead incident...
-
Principal Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Cubic Transportation Full time ₹ 15,00,000 - ₹ 20,00,000 per yearHiring Principal Site Reliability EngineerExperience: 12 to 18 YearsLocation: HyderabadNotice Period: Immediate to 30 DaysKey ResponsibilitiesDesign, deploy, and maintain scalable, secure applications and infrastructure in cloud or hybrid environmentsImplement and manage robust monitoring, alerting, and observability systemsAutomate recurrent operational...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability Engineer Experience: 7+ Years Location: Hyderabad Skills for Principal: ~ Strong leadership and people management skills. ~ Exceptional technical proficiency in Pearson's technology stack. ~ Advanced project management capabilities. ~ Excellent communication and collaboration skills. ~ Adept at risk assessment and crisis management. ~...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India IntraEdge Full time US$ 1,20,000 - US$ 2,00,000 per yearSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis management.Strategic thinking with a...