Principal Site Reliability Engineer
2 weeks ago
Job Description Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together. Primary Responsibilities: - Provision infrastructure using Terraform in cloud(Azure) environments - Manage and optimize Azure Cloud Infrastructure, building resilient, self-scaling systems - Ensure availability, performance, monitoring, and infrastructure provisioning for platforms spanning Cloud(Azure) and On-Prem technologies - Collaborate with Engineering and Technical Support teams to resolve critical issues - Automate repeatable tasks to reduce operational toil - Deploy applications using CI/CD tools and manage the full lifecycle: code repository, scanning, artifact management, compliance, deployment, and configuration - Partner with development teams to resolve platform-related roadblocks - Conduct post-mortems and drive continuous improvement after production incidents - Implement automation, self-healing, and real-time monitoring in production systems - Participate in cross-functional projects involving Engineering, Cloud, Networking, CI/CD, Monitoring, and Project Management - Stay current with emerging technologies and drive innovation - Enhance CI/CD pipelines with automated performance and load testing - Collaborate with DevOps and QA to integrate performance benchmarks into release gates - Cloud Architecture & Reliability - Design and implement scalable, reliable cloud architectures - Drive innovation in SRE through AI and automation - Explore and implement AI-driven solutions for anomaly detection, incident prediction, and intelligent alerting - Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regard to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so Required Qualifications: - 10+ years in Software Engineering, DevOps, or SRE roles, with 3+ years in a principal or lead capacity - 5+ years experience with CI/CD tooling (e.g., GitHub Actions) - 5+ years experience with container orchestration in cloud platforms (Azure) - 5+ years deep experience in observability and monitoring tools (Prometheus, Grafana) - 5+ years experience with Docker and Kubernetes - 3+ years hands-on experience with Terraform and Infrastructure as Code - Experience migrating legacy solutions to Azure/Cloud-hosted environments - Experience managing and migrating on-premises environments - Solid scripting and automation skills in Python and PowerShell (Python preferred) - Security & compliance: - Ability to strengthen infrastructure security posture across all environments - Ability to conduct regular security assessment and apply best practices
-
Principal Site Reliability Engineer
2 weeks ago
Gurugram, Haryana, India, IN Cvent Full timeCvent is looking for a Principal Site Reliability Engineer to help us scale our systems and ensure stability, reliability and performance and rapid deployments of our platform. We build teams that are inclusive, collaborative, and have a strong sense of ownership for the things they build. If you have a passion and track record for solving problems;...
-
Principal Site Reliability Developer
3 weeks ago
Bengaluru, India Oracle Full timeJob Description We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles, ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will design and...
-
Principal Site Reliability Developer
3 weeks ago
Hyderabad, India Oracle Full timeJob Description Job Description We are seeking a Principal Site Reliability Developer (IC4) to join Oracle Cloud Infrastructure (OCI). This role blends software engineering expertise with site reliability engineering (SRE) principles, ensuring our large-scale distributed systems are reliable, observable, and efficient. As a senior technical leader, you will...
-
Site Reliability Engineer
1 week ago
Noida, India NTT Data Full timeJob Description NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Site Reliability Engineer to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Description - Site...
-
Site Reliability Engineer
5 days ago
Noida, India NTT DATA North America Full timeJob Description Req ID: 350360 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Site Reliability Engineer to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Role Overview...
-
Site Reliability Engineer
2 days ago
Noida, Uttar Pradesh, India CorroHealth Full timeWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and systems administration, with a focus on creating scalable and reliable systems. You will work closely with development and operations teams to ensure the reliability, availability, and...
-
Sr Site Reliability Engineer
2 weeks ago
Hyderabad, India GHX Full timeJob Description Site Reliability Engineer (SRE) Position Summary The Site Reliability Engineer (SRE) will be a hands-on contributor within the Site Reliability Engineering Center of Excellence (CoE), responsible for building monitoring and observability solutions, troubleshooting production issues, and participating in 24x7 on-call operations. This role...
-
Site Reliability Engineer
7 days ago
Noida, India Microsoft Full timeJob Description Overview Do you want to work on a product that is used by millions of people around the world daily, and growing rapidly Do you care deeply about how software is designed with a focus on supporting global scale Do you want to be part of a world-class team that continuously pushes the boundary of service and engineering excellence TheWeb...
-
Site Reliability Engineer
6 days ago
Noida, India NTT Full timeJOB DESCRIPTION Req ID: We are currently seeking a Site Reliability Engineer to join our team in Noida, Uttar Pradesh (IN-UP), India (IN). Job Description – Site Reliability Engineer (5–8 Years Experience) Role Overview We are seeking an experienced Site Reliability Engineer (SRE) with 5–8 years of expertise in ensuring the reliability,...
-
Site Reliability Engineer
3 weeks ago
India Pagos Consultants Full timewe are looking for experienced site reliability engineers to join a founding team of startup-minded individuals that will lay the groundwork for our new fintech offering. This team will play a pivotal role in spearheading innovation. As such, you will have the opportunity to shape the early architecture and design of the system and set the trajectory for its...