Principal Site Reliability Engineer
1 week ago
Business Unit
Cubic Transportation Systems
Company Details
When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people's lives easier by simplifying their daily journeys, and defense capabilities to help promote mission success and safety for those who serve their nation. Led by our talented teams around the world, Cubic is committed to solving global issues through innovation and service to our customers and partners.
We have a top-tier portfolio of businesses, including Cubic Transportation Systems (CTS) and Cubic Defense (CD). Explore more on
Job Details
The Senior Site Reliability Engineer is a leader within the team, responsible for designing, building, and owning the complex infrastructure and deployment systems that underpin our live environments. This role is both hands-on and strategic, requiring deep technical expertise and strong collaboration skills. You will mentor junior engineers and work closely with development teams to architect and implement systems that are reliable, scalable, and highly automated. Senior SREs are expected to drive the adoption of robust, automated solutions and ensure those solutions are well-documented and understood across engineering.
Core Responsibilities
Infrastructure Design & Maintenance
Lead the design, build, and maintenance of our core infrastructure using infrastructure-as-code (IaC) tools (e.g., Terraform, CloudFormation).
- Own the provisioning and lifecycle management of production, staging, and other critical environments.
- Architect and implement shared infrastructure components (e.g., logging, metrics, service mesh, load balancing).
- Drive continuous improvements to infrastructure scalability, availability, and performance.
Act as a key partner to development teams, providing infrastructure primitives and strategic guidance on deployment needs.
Deployment Systems & CI/CD
Design, own, and enhance our CI/CD pipelines (GitHub Actions, Argo CD) to maximize reliability, velocity, and automation.
- Establish and enforce best practices across all environments for deployment, rollback, and observability.
- Partner with developers to architect and streamline the testing and delivery of code to production.
Champion the elimination of manual steps in deployment and operations workflows.
Reliability, Observability & Tooling
Architect and manage our monitoring, alerting, and logging infrastructure (Kube-Prometheus-Grafana stack).
- Define, implement, and track SLOs/SLIs for core services, holding service owners accountable.
- Proactively identify and eliminate single points of failure, performance bottlenecks, and sources of instability.
- Lead reliability reviews, blameless post-incident analyses, and capacity planning initiatives.
Perform basic debugging of Java applications to assist development teams in troubleshooting.
Documentation & Knowledge Sharing
Ensure all systems and processes built or maintained by the SRE team are accompanied by thorough, up-to-date documentation.
- Mentor other engineers and contribute to shared knowledge bases, runbooks, and developer-facing materials.
Lead internal training sessions, walkthroughs, and pairings to cross-train teammates and reduce knowledge silos.
Collaboration & Culture
Work closely with the SRE Lead to define team strategy, prioritize work, and execute on team goals.
- Mentor junior team members and act as a technical leader across engineering.
- Participate in on-call rotations, acting as an escalation point for complex issues.
- Champion a culture of blameless learning, transparency, and continuous improvement.
Qualifications & Skills
- Experience: 8+ years in a senior SRE, DevOps, or related infrastructure role.
- Cloud: Deep, hands-on expertise with AWS, including services like ECS, EKS, Aurora (Postgres), EC2, S3, and VPC.
- Containers & Orchestration: Strong, production-level proficiency with Kubernetes and Helm. Deep understanding of container runtimes and networking.
- CI/CD: Extensive experience designing, building, and managing complex CI/CD pipelines using tools like GitHub Actions and Argo CD. Experience with container registries like GHCR.
- IaC: Expertise in Infrastructure as Code, with strong proficiency in Terraform or CloudFormation.
- Observability: Proven experience with observability stacks, particularly the Kube-Prometheus-Grafana stack, including custom metric instrumentation and advanced dashboarding.
- Debugging: Ability to perform basic performance analysis and debugging of applications (Java experience is a strong plus).
- Leadership: Demonstrated ability to mentor junior engineers, lead technical projects, and drive architectural decisions.
- Incident Management: Experience leading incident response, conducting blameless post-mortems, and driving resulting action items to completion.
Worker Type
Employee
-
Principal Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Cubic Transportation Systems Full time ₹ 12,00,000 - ₹ 36,00,000 per yearHiring Principal Site Reliability EngineerExperience: 12+ YearsLocation: HyderabadNotice: Immediate to 30 DaysWe're seeking an experiencedSite Reliability Engineer (SRE)to ensure our services are robust, scalable, secure, and maintainable. You will blend software engineering and systems operations to automate processes, monitor performance, lead incident...
-
Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India IntraEdge Full time ₹ 15,00,000 - ₹ 25,00,000 per yearSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis management.Strategic thinking with a...
-
Principal Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Cubic Transportation Full time ₹ 12,00,000 - ₹ 36,00,000 per yearHiring Principal Site Reliability EngineerExperience: 12 to 18 YearsLocation: HyderabadNotice Period: Immediate to 30 DaysKey ResponsibilitiesDesign, deploy, and maintain scalable, secure applications and infrastructure in cloud or hybrid environmentsImplement and manage robust monitoring, alerting, and observability systemsAutomate recurrent operational...
-
Site Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...
-
Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Oracle Financial Services Software Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSenior Principal Site Reliability Engineer, Fusion SRE About Oracle Cloud: Oracle Cloud is a comprehensive suite of cloud services—including infrastructure, platform, and applications—designed to help organizations build, deploy, and manage workloads securely at scale. At Oracle, we are building the most intelligent future of cloud computing. Our...
-
Principal Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India Amgen Inc Full time ₹ 8,00,000 - ₹ 12,00,000 per yearWe are looking for a Site Reliability Engineer/Cloud Engineer (SRE) to work on the performance optimization, standardization, and automation of Amgens critical infrastructure and systems. This role is crucial to ensuring the reliability, scalability, and cost-effectiveness of our production systems. The ideal candidate will work on operational excellence...
-
Principal Engineer, Site Reliability T500-20232
4 weeks ago
Hyderabad, Telangana, India ANSR Full timeANSR is hiring for one of its client:About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America's supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional...
-
Site Reliability Engineering
2 weeks ago
Hyderabad, Telangana, India Acesoft Labs Full time ₹ 20,00,000 - ₹ 25,00,000 per yearHi ,Kindly find the below JD :Job Title: Site Reliability Engineering (SRE) ManagerLocation: HyderabadEmployment Type: Full-TimeWork Model - 3 Days from office (Hybrid)Summary:The SRE Manager at TechBlocks India will lead the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance. This hybrid role blends...
-
Site Reliability Engineering
2 weeks ago
Hyderabad, Telangana, India TECHBLOCKS Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title: Site Reliability Engineering (SRE) ManagerLocation: HyderabadEmployment Type: Full-TimeWork Model - 3 Days from office (Hybrid)Summary:The SRE Manager at TechBlocks India will lead the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance. This hybrid role blends technical leadership with team...
-
Principal Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Cubic Corporation Full time ₹ 12,00,000 - ₹ 36,00,000 per yearBusiness Unit:Cubic Transportation SystemsCompany Details:When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people's lives easier by simplifying their daily journeys, and defense capabilities to help promote mission success and safety for those who serve their nation. Led by our...