Sr. Site Reliability Engineer

1 day ago

Hyderabad India Amgen Full time

Job Description Join Amgen's Mission of Serving Patients At Amgen, if you feel like you're part of something bigger, it's because you are. Our shared missionto serve patients living with serious illnessesdrives all that we do. Since 1980, we've helped pioneer the world of biotech in our fight against the world's toughest diseases. With our focus on four therapeutic areas Oncology, Inflammation, General Medicine, and Rare Disease we reach millions of patients each year. As a member of the Amgen team, you'll help make a lasting impact on the lives of patients as we research, manufacture, and deliver innovative medicines to help people live longer, fuller happier lives. Our award-winning culture is collaborative, innovative, and science based. If you have a passion for challenges and the opportunities that lay within them, you'll thrive as part of the Amgen team. Join us and transform the lives of patients while transforming your career. Sr. Site Reliability Engineer What You Will Do Let's do this. Let's change the world. In this vital role you will play a key role in building, scaling, and securing the platforms that underpin Amgen's global digital initiatives. This role focuses on ensuring the reliability, performance, and efficiency of cloud-native platforms while enabling development velocity and operational excellence. You will be responsible for designing and operating infrastructure and shared platforms used across the enterprise, including CI/CD, observability, incident management, and collaboration systems. You will work extensively with containerized environments, handle multi-tenant Kubernetes platforms, and automate processes to improve resilience and reduce operational burden. This role requires deep technical depth, leadership skills, and the ability to drive initiatives across cross-functional teams and global stakeholders. Roles & Responsibilities: Platform Reliability Engineering - Design, operate, and scale secure, highly available cloud-based infrastructure using Infrastructure as Code (IaC). - Handle multi-tenant container orchestration environments with advanced access controls, workload isolation, and governance policies. - Ensure enterprise CI/CD platforms are performant, secure, and optimized for high-throughput engineering teams. Monitoring, Observability & Incident Management - Build and handle observability platforms for full-stack visibility, leveraging metrics, logs, and traces. - Define, implement, and continuously refine SLIs, SLOs, and error budgets for platform health and service performance. - Automate incident response workflows, integrate with incident management platforms, and lead post-incident reviews and root cause analysis. - Enterprise Platform Administration - Operate and improve core engineering platforms (e.g., CI/CD, collaboration, knowledge sharing) to ensure availability, security, and ease of use. - Automate platform provisioning, upgrades, access controls, and integration pipelines to reduce manual effort and improve consistency. - Implement compliance, audit logging, and policy enforcement through code-driven governance models. AI Adoption & Enablement - Drive the adoption of AI/ML-based tools to enhance observability, incident prediction, remediation, and intelligent alerting. - Evaluate and integrate AI-assisted automation platforms to reduce toil and improve operational efficiency. - Partner with platform, security, and development teams to embed predictive analytics into dashboards, workflows, and root cause tooling. - Champion a data-driven SRE practice by enabling thoughtful insights and anomaly detection across systems and platforms. Leadership & Collaboration - Serve as a technical thought leader and mentor within the SRE organization. - Promote SRE principles and reliability culture across engineering teams. - Collaborate with cross-functional stakeholders to influence architecture, roadmaps, and platform investment. - Lead operational reviews and service health retrospectives, with a focus on continuous improvement. - Participate in Agile and SAFe delivery processesincluding sprint planning, stand-ups, retrospectives, and PI planningto ensure security and platform reliability are embedded across development cycles. What We Expect Of You We are all different, yet we all use our unique contributions to serve patients. The [vital attribute] professional we seek is a [type of person] with these qualifications. Basic Qualifications: - Doctorate degree / Master's degree / Bachelor's degree and 8 to 13 years in Computer Science, Information Technology, or a related technical field - Demonstrated success operating cloud-native infrastructure in production environments - Practical experience handling Kubernetes clusters and CI/CD environments at enterprise scale - Exposure to global on-call or incident support rotations - Excellent collaboration and communication skills across technical and non-technical teams Preferred Qualifications: Must-Have Skills: - Deep experience with cloud platforms (AWS, Azure, or GCP), including services such as compute, networking, IAM, and VPC design - Proven proficiency in Infrastructure as Code (IaC) using tools such as Terraform or CloudFormation - Advanced skills in managing container orchestration platforms (e.g., Kubernetes), including workload isolation, resource quotas, and role-based access control - Strong understanding of Linux system administration, process management, and system performance tuning - Hands-on experience with CI/CD platforms and pipelines (build automation, artifact storage, environment provisioning, rollback strategies) - Strong background in observability tooling, including Prometheus, Grafana, Dynatrace, and distributed tracing frameworks like OpenTelemetry or Jaeger - Strong practical experience with incident management platforms and practices (e.g., alert routing, runbooks, escalation paths) - Automation and scripting proficiency in languages such as Python, Go, or Bash - Experience with configuration management tools like Ansible, Chef, or SaltStack - Strong grasp of networking fundamentals, such as routing, DNS, OSI layers, load balancing, firewalls, TLS, and security groups - Version control and collaboration workflows using Git and GitOps principles - Experience with enterprise collaboration platforms, including provisioning, integration, and permission control Good-to-Have Skills: - Exposure to service mesh technologies (e.g., Istio, Linkerd) and zero-trust network concepts - Familiarity with secrets management platforms (e.g., HashiCorp Vault, AWS Secrets Manager) - Experience using incident response and chaos engineering tools (e.g., Gremlin, Chaos Mesh) - Background in cost optimization, budgeting, and resource tracking (FinOps) - Awareness of policy-as-code frameworks (e.g., OPA, Kyverno) - Familiarity with feature flagging and progressive delivery tools (e.g., LaunchDarkly, Argo Rollouts) - Integration experience with ticketing and change management platforms (e.g., ServiceNow, Jira) - Understanding of compliance standards (e.g., HIPAA, GDPR, SOC 2) and how they apply to infrastructure operations - Understanding of security and encryption technologies and authentication protocols such as OpenID, OIDC, OAuth, SAML, and LDAP Professional Certifications (Preferred) - Cloud DevOps Certification (AWS/Azure/GCP) - Certified Kubernetes Administrator (CKA) or Security Specialist (CKS) - CI/CD Platform Certification - ITIL Foundation or equivalent service management certification Soft Skills: - High level of ownership and accountability for platform reliability - Strong diagnostic and analytical capabilities with a bias for action - Clear and confident communicator with an ability to influence without authority - Passion for automation, operational excellence, and team mentorship Shift Information: This position is an onsite role and may require working during later hours to align with business hours. Candidates must be willing and able to work outside of standard hours as required to meet business needs. What You Can Expect Of Us As we work to develop treatments that take care of others, we also work to care for your professional and personal growth and well-being. From our competitive benefits to our collaborative culture, we'll support your journey every step of the way. In addition to the base salary, Amgen offers competitive and comprehensive Total Rewards Plans that are aligned with local industry standards. Apply now and make a lasting impact with the Amgen team. careers.amgen.com As an organization dedicated to improving the quality of life for people around the world, Amgen fosters an inclusive environment of diverse, ethical, committed and highly accomplished people who respect each other and live the Amgen values to continue advancing science to serve patients. Together, we compete in the fight against serious disease. Amgen is an Equal Opportunity employer and will consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other basis protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Sr. Engineer, Site Reliability

24 hours ago

india Intel Full time

Job Description Do you want to innovate an industry leading developer cloud? Join SATG as a Sr. Engineer, Site Reliability.The cloud development division within Software and Advanced Technology Group (SATG) is developing and shaping the way people think about computing by focusing on developers, ecosystem partners, academia etc. We are redefining the space...
Senior Site Reliability Engineer

3 weeks ago

Hyderabad, India IntraEdge Full time

Job Description Strong leadership and people management skills. Exceptional technical proficiency in Pearson's technology stack. Strategic thinking with a focus on long-term operational excellence. Champion operational excellence by directing initiatives that elevate system reliability, availability, and overall efficiency. Function as the diplomatic link...
Site Reliability Engineer

1 day ago

Hyderabad, India UBS Full time

Job Description Job Reference # 322870BR Job Type Full Time Your role Are you an analytic thinker Do you enjoy Site Reliability Engineering initiatives and proactive problem management across on-premises & Cloud Database ensuring high availability & stability of Database infrastructure services Do you want to play a key role in transforming our firm into an...
Site Reliability Engineer

1 week ago

Hyderabad, Telangana, India Oracle Financial Services Software Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Principal Site Reliability Engineer Oracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Mainframe zLinux, DB2, zVM, AIX. Site Reliability Engineer expected to work with multiple service and product development teams,...
Senior Site Reliability Engineer

4 hours ago

India IntraEdge Full time

Strong leadership and people management skills. Exceptional technical proficiency in Pearson's technology stack. Strategic thinking with a focus on long-term operational excellence. Champion operational excellence by directing initiatives that elevate system reliability, availability, and overall efficiency. Function as the diplomatic link that binds the SRE...
Sr Engineer Reliability

3 days ago

Jamnagar, India Reliance Industries Limited Full time

Job Description Job Description Job Role: Sr Engineer Reliability Job Role ID SECTION I: BASIC INFORMATION ABOUT THE JOB ROLE Job Role Variant: Sr Engineer - Static Manager Job Position: Section Head Reliability Job Position ID Value Stream: Asset Operations Job Family: Engineering Sub-Job Family: CES Grade/Level Location: SECTION II: PURPOSE OF THE ROLE To...
Site Reliability Engineer III

3 days ago

Hyderabad, India Chase Bank Full time

Job Description There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Corporate Oversight & Governance Team - Regulatory Controls Ops...
Site Reliability Engineer

1 week ago

Hyderabad, India Whatjobs IN C2 Full time

Job Title: Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog | 24/7 Support Department: Site Reliability Engineering Location: Hyderabad, India Employment Type: Full-Time Notice period: 0-15 Days We’re hiring a Site Reliability Engineer to join our SRE team focused on maintaining the performance, reliability, and availability of our fintech...
Senior Site Reliability Engineer

2 weeks ago

Hyderabad, India Insight Global, LLC Full time

Job Title : Sr. SREAbout the Company : Insight Globals ClientType : Ongoing EOR, depending on experience levelLocation : ONSITE 4X/WEEK in HITEC City, Hyderabad, INPriority scheduling for candidates who : - Submit resume promptly- Are available for immediate interviews- Connect via LinkedIn with resume and CTC rateRequirements : - Ability to be onsite...
Site Reliability Engineer

3 weeks ago

Bengaluru, India Relanto Full time

Job Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...

Americas

Europe

Asia / Oceania

Africa

Sr. Site Reliability Engineer