Site Reliability Engineer
2 weeks ago
We have a top-tier portfolio of businesses, including Cubic Transportation Systems (CTS) and Cubic Defense (CD). Explore more on Job Details:
The Junior Site Reliability Engineer is responsible for assisting in the design, build, and maintenance of the infrastructure and deployment systems that underpin our live environments. This role is hands-on and highly collaborative, working closely with development teams and senior SREs to ensure our systems are reliable, scalable, and well-instrumented. Junior SREs are expected to learn and apply best practices in building robust, automated solutions, and to ensure their work is repeatable and understandable by others. Every contribution should be accompanied by documentation to support knowledge-sharing within the team and across engineering.
Core Responsibilities- Infrastructure Design & Maintenance
- Assist in building and maintaining infrastructure using infrastructure-as-code (IaC) tools (e.g., Terraform, CloudFormation).
- Support the provisioning and lifecycle management of production, staging, and other critical environments.
- Help implement shared infrastructure components (e.g., logging, metrics, service mesh, load balancing).
- Contribute to improving infrastructure scalability, availability, and performance under the guidance of senior engineers.
- Collaborate with development teams to provide infrastructure support for their deployment needs.
- Deployment Systems & CI/CD
- Support and help extend CI/CD pipelines (GitHub Actions, Argo CD) to improve reliability and automation of deployments.
- Help promote consistency and best practices across environments for deployment, rollback, and observability.
- Work with developers to streamline testing and delivery of code to production.
- Assist in reducing manual steps in the deployment and operations workflows.
- Reliability, Observability & Tooling
- Assist in the implementation and maintenance of our monitoring, alerting, and logging infrastructure (Kube-Prometheus-Grafana stack).
- Help track SLOs/SLIs for core services in partnership with service owners.
- Learn to identify and help eliminate single points of failure, performance bottlenecks, and sources of instability.
- Participate in reliability reviews and post-incident analysis.
- Documentation & Knowledge Sharing
- Ensure that all systems and processes you work on are accompanied by thorough, up-to-date documentation.
- Contribute to shared knowledge bases, runbooks, and developer-facing onboarding materials.
- Participate in internal training sessions and pairings to learn from teammates.
- Collaboration & Culture
- Work closely with the SRE Lead and other team members to execute work aligned with team goals.
- Engage constructively with other teams across engineering.
- Participate in on-call rotations with strong support from senior members.
- Embrace a culture of blameless learning, transparency, and continuous improvement.
- Experience: 3+ years in a DevOps, SRE, or related role.
- Cloud: Basic understanding of cloud computing concepts, with some hands-on experience in AWS.
- Containers & Orchestration: Familiarity with Docker and a foundational understanding of Kubernetes concepts. Experience with AWS ECS is a plus.
- CI/CD: Exposure to CI/CD principles and tools like GitHub Actions. Familiarity with Argo CD is a bonus.
- IaC: Some experience with or exposure to Infrastructure as Code tools like Terraform or CloudFormation.
- Scripting: Proficiency in at least one scripting language (e.g., Bash, Python).
- Observability: A basic understanding of monitoring and logging. Exposure to Prometheus and Grafana is desirable.
- Collaboration: Strong communication skills and a desire to learn and work within a team.
- Problem Solving: An enthusiastic and curious approach to solving technical challenges.
-
Site Reliability Engineer
7 days ago
Hyderabad, Telangana, India Assurant Full timeSite Reliability Engineer, GCC-AssurantThe Site Reliability Engineer (SRE) will be part of the Assurant Reliability Team, specifically within the Site Reliability Engineering area. This remote position, based in India, focuses on building and maintaining reliable, scalable systems through a combination of software development and network diagnostics. The...
-
Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Assurant Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSite Reliability Engineer, GCC-Assurant The Site Reliability Engineer (SRE) will be part of the Assurant Reliability Team, specifically within the Site Reliability Engineering area. This remote position, based in India, focuses on building and maintaining reliable, scalable systems through a combination of software development and network diagnostics. The...
-
Site Reliability Engineer
7 days ago
Hyderabad, Telangana, India Elios Talent Full timeSite Reliability EngineerKey Highlights Build, automate, and support cloud-native infrastructure powering high-availability platforms Contribute to automation-first engineering across AWS, Terraform, CI/CD, and observability tooling Improve reliability, uptime, system health, and performance across production environments Strengthen DevSecOps...
-
Principal Site Reliability Engineer
1 day ago
Hyderabad, Telangana, India Oracle Full timeOracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Mainframe zLinux, DB2, zVM, AIX. Site Reliability Engineer expected to work with multiple service and product development teams, identifying cross-team issues that...
-
Senior Site Reliability Engineer
22 hours ago
Hyderabad, Telangana, India Jade Global Full timeSenior Site Reliability Engineer (SRE) – Datadog Observability1Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India VXI Global Solutions Full timeWe are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience withPrometheus,Grafana,Google Cloud Monitoring, andOpenTelemetry, along with exposure toSolarWinds. You should be...
-
Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India, Telangana NationsBenefits India Full timeJob Title: Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog | 24/7 Support Department: Site Reliability Engineering Location: Hyderabad, India Employment Type: Full-Time Notice period: 0-15 DaysWe’re hiring a Site Reliability Engineer to join our SRE team focused on maintaining the performance, reliability, and availability of our fintech...
-
Senior Site Reliability Engineer
1 day ago
Hyderabad, Telangana, India Jade Global Full timeJob Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog Location: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE...
-
Site Reliability Engineer III
2 weeks ago
Hyderabad, Telangana, India JPMorganChase Full time ₹ 12,00,000 - ₹ 36,00,000 per yearDescriptionThere's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Corporate Oversight & Governance Team - Regulatory Controls Ops Risk...
-
Principal Site Reliability Engineer
5 days ago
Hyderabad, Telangana, India Oracle Full timeOracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...