Sr Engineer, Site Reliability
3 days ago
About T-Mobile: T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience. TMUS Global Solutions: TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking. TMUS India Private Limited operates as TMUS Global Solutions. About the Role: As a Senior Site Reliability Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will play a pivotal role in building and scaling intelligent infrastructure to support AI/ML applications, enterprise services, and LLM-based platforms. You will contribute to the design and implementation of observability frameworks, automation-first operations, and incident response strategies to ensure reliability, performance, and scalability across production systems. What You’ll Do: Implement and maintain observability, monitoring, and alerting systems for AI platforms and backend services Design and support telemetry pipelines, logging infrastructure, and dashboards (Splunk, Prometheus, Grafana, Open Telemetry) Define and monitor SLOs, SLIs, latency, availability, and throughput metrics Participate in on-call rotations, incident resolution, root cause analysis, and postmortems Improve CI/CD workflows and infrastructure automation using GitLab pipelines Optimize and scale infrastructure including Kafka, RMQ, HAProxy, and distributed APIs Collaborate with engineering teams on governance, compliance, and secure operations Support capacity planning, cost analysis, and tuning for high-scale performance Automate repetitive tasks and reduce toil via scripting (Python, Bash, Java) Contribute to runbooks, knowledge base articles, and SRE best practice documentation Mentor junior engineers and support a culture of operational excellence and reliability What You’ll Bring: Bachelor’s degree in Computer Science, Engineering, or a related technical field 4-7 years in SRE, DevOps, platform, or operations engineering roles Strong hands-on experience in observability, monitoring, and distributed systems troubleshooting Proficiency in scripting languages such as Python, Bash, or PowerShell CI/CD experience with GitLab and automation across deployment pipelines Solid understanding of SQL and NoSQL systems including Oracle DB and MongoDB Familiarity with Kubernetes, container orchestration, and hybrid cloud (Azure, AWS, GCP, OCI) Experience working in high-stakes, incident-driven environments Strong working knowledge of Splunk, Grafana, Prometheus, and other observability tools Understanding of AI/ML systems, inference APIs, and LLM infrastructure is a plus Experience in platform compliance, security enforcement, and regulated domains (finance preferred) Must Have Skills: Application & Microservice: Java, Spring boot, API & Service Design Any CI/CD Tools : Gitlab Pipeline/Test Automation/GitHub Actions/ Jenkins /Circle CI App Platform: Docker & Containers (Kubernetes) Any Databases : SQL & NOSQL (Cassandra/Oracle/Snowflake/MongoDB) Any Messaging: Kafka, Rabbit MQ Any Observability/Monitoring: Splunk/ Grafana/ Open Telemetry /ELK Stack/ Datadog/ New Relic/ Prometheus) Incident/Change/Problem Management Nice To Have: Multi-region failover (SQL Server, MongoDB, vendors) Observability platform design (sampling, retention policies) Own domain SLOs and error budgets Perf engineering for latency-sensitive apps Toil automation (SRE bots, operators
-
Site Reliability Engineer
1 week ago
hyderabad district, India Sonata Software Full timeRole: Site Reliability Engineer Location: Hyderabad Notice Period: Immediate to 20 Days Employment Type: Full Time Experience 7–12 years in site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering within EdTech platforms (2U) Primary Skills (Must-Have) AWS, CI/CD, Jenkins, IAAC,...
-
DevOps Engineer
3 weeks ago
Hyderabad, India Axceltran digital private limited Full timeDescription :Qualifications :- Proven experience as a Site Reliability Engineer, Sr DevOps Engineer, or similar role.- 5 to 7 years of Relevant experience, at least 2 years of experience in Microsoft Azure. Good to have AWS and GCP.- Experience in setting up and managing OTEL, using Loki, Tempo, Promotus, Grafana, Alloy etc.- Experience in creating CI/CD...
-
Sr Engineer, Site Reliability
3 days ago
hyderabad district, India TMUS Global Solutions Full timeAbout T-Mobile: T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience. About TMUS Global...
-
Lead Site Reliability Engineer
6 days ago
hyderabad district, India Atyeti Inc Full timeJob Description : We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. Bachelor’s degree in computer science, Engineering, or equivalent practical experience. 7+ years’ experience in Site Reliability deploying and managing large-scale distributed systems successfully. Understanding of SRE concepts (error...
-
Site Reliability Engineer
4 days ago
bangalore district, India IntraEdge Full timeJob Title: Site Reliability Engineer (SRE) – Production Support Location: Bengaluru Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management . The ideal candidate will be responsible for maintaining the reliability, performance, and...
-
AWS Site Reliability Engineer
1 week ago
hyderabad district, India HTC Global Services Full timeHTC – A brief profile Established in 1990, HTC Inc., a company with headquarters in Troy, Michigan, is a leading global Information Technology solution and BPO provider. HTC assists clients across multiple industry verticals, offering turnkey project lifecycle in, e-business, data warehousing, embedded systems, ECM, SCM, CRM, and ERP solutions. HTC Inc....
-
Site Reliability Engineer
4 days ago
Hyderabad, Telangana, India Oracle Financial Services Software Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per yearPrincipal Site Reliability Engineer Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and...
-
Site Reliability Engineer
1 week ago
hyderabad district, India GSPANN Technologies, Inc Full timeAbout Company : Headquartered in California, U.S.A., GSPANN provides consulting and IT services to global clients. We help clients transform how they deliver business value by helping them optimize their IT capabilities, practices, and operations with our experience in retail, high-technology, and manufacturing. With five global delivery centers and 2000+...
-
Senior Site Reliability Engineer
18 hours ago
Hyderabad, India Insight Global, LLC Full timeJob Title : Sr. SREAbout the Company : Insight Globals ClientType : Ongoing EOR, depending on experience levelLocation : ONSITE 4X/WEEK in HITEC City, Hyderabad, INPriority scheduling for candidates who : - Submit resume promptly- Are available for immediate interviews- Connect via LinkedIn with resume and CTC rateRequirements : - Ability to be onsite...
-
Engineer, Site Reliability
3 days ago
hyderabad district, India TMUS Global Solutions Full timeAbout T-Mobile: T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience. About TMUS Global...