
Principal Site Reliability Engineer
2 days ago
Roles & Responsibilities:
- Talent Management & Team Leadership:Lead, mentor, empower and manage 5-10 hard-working engineering team to deliver exceptional results
- System Reliability, Performance Optimization & Cost Reduction:Ensure the reliability, scalability, and performance of Amgens infrastructure, platforms, and applications. Proactively identify and resolve performance bottlenecks, and implement long-term fixes. Continuously evaluate system design and usage to find opportunities for cost optimization, ensuring infrastructure efficiency without compromising reliability.
- Automation & Infrastructure as Code (IaC):Drive the adoption of automation and Infrastructure as Code (IaC) across the organization to streamline operations, minimize manual interventions, and enhance scalability. Implement tools and frameworks (such as Terraform, Ansible, or Kubernetes) that increase efficiency and reduce infrastructure costs through optimized resource utilization.
- Standardization of Processes & Tools:Establish standardized operational processes, tools, and frameworks across Amgens technology stack to ensure consistency, maintainability, and best-in-class reliability practices. Champion the use of industry standards to optimize performance and increase operational efficiency.
- Monitoring, Incident Management & Continuous Improvement:Implement and maintain comprehensive monitoring, alerting, and logging systems to detect issues early and ensure rapid incident response. Lead the incident management process to minimize downtime, conduct root cause analysis, and implement preventive measures to avoid future occurrences. Foster a culture of continuous improvement by demonstrating data from incidents and performance monitoring.
- Collaboration & multi-functional Leadership:Partner with software engineering, DevOps, and IT teams to integrate reliability, performance optimization, and cost-saving strategies throughout the development lifecycle. Act as a domain expert in SRE principles and advocate for standard methodologies across all teams.
- Capacity Planning & Disaster Recovery:Develop and implement capacity planning processes to support future growth, performance, and cost management. Maintain disaster recovery strategies to ensure system reliability and minimize downtime in the event of failures.
What we expect of you
We are all different, yet we all use our unique contributions to serve patients.
Basic Qualifications:
- Masters degree and 8 to 10 years of Computer Science, Engineering, or related field experience OR
- Bachelors degree and 10 to 14 years of Computer Science, Engineering, or related field experience OR
- Diploma and 14 to 18 years of Computer Science, Engineering, or related field experience
Preferred Qualifications:
- Performance Tuning & Cost Optimization:Expertise in identifying performance bottlenecks in large-scale distributed systems and implementing optimization strategies. Experience with cost management in cloud environments (AWS, Azure) to drive cost-effective infrastructure decisions.
- Automation Tools & Infrastructure as Code:Deep expertise with automation tools such as Terraform, Ansible, or Puppet, and hands-on experience with Infrastructure as Code (IaC) to automate infrastructure provisioning and maintenance, enhancing both performance and cost efficiency.
- Monitoring & Incident Management:Proficient in deploying and managing monitoring solutions in production such as Dynatrace, Datadog, or New Relic to maintain high system performance and ensure rapid incident response. Proven experience with incident management
- Standardization & Best Practices:Strong background in creating and enforcing standardized processes, coding practices, and frameworks to ensure consistency, scalability, and improved system performance, and evangelize by collaborating across teams
Good-to-Have Skills:
- Experience with containerization (Docker) and orchestration tools (Kubernetes) to optimize resource usage and improve scalability.
- Knowledge of cloud-native technologies and strategies for cost optimization in multi-cloud environments.
- Familiarity with distributed systems, databases, and large-scale system architectures.
Certifications
- AWS Certified DevOps Engineer - Professional
- Recognizes sophisticated knowledge of AWS and DevOps standard methodologies to automate and optimize infrastructure and applications in AWS.
- Certified Kubernetes Administrator (CKA)
- Validates skills required to design, build, and maintain production-grade Kubernetes clusters.
-
Site Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India IntraEdge Full timePosition - SRE (Site Reliability Engineer)Experience - 5+ YearsLocation - HyderabadSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...
-
Site Reliability Engineer
2 days ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:- Strong leadership and people management skills.- Exceptional technical proficiency in Pearson's technology stack.- Advanced project management capabilities.- Excellent communication and collaboration skills.- Adept at risk assessment and...
-
Senior Lead Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Chase Bank Full timeJob DescriptionElevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, youwork with your fellow stakeholders to define non-functional...
-
Hyderabad, Telangana, India Cubic Transportation Systems Full timeHiring Principal Site Reliability EngineerExperience: 12+ YearsLocation: HyderabadNotice: Immediate to 30 DaysWe're seeking an experienced Site Reliability Engineer (SRE) to ensure our services are robust, scalable, secure, and maintainable. You will blend software engineering and systems operations to automate processes, monitor performance, lead incident...
-
Principal Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India Cubic Corporation Full time ₹ 15,00,000 - ₹ 20,00,000 per yearBusiness Unit:Cubic Transportation SystemsCompany Details:When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people's lives easier by simplifying their daily journeys, and defense capabilities to help promote mission success and safety for those who serve their nation. Led by our...
-
Site Reliability Engineer
3 days ago
Hyderabad, Telangana, India Talent Worx Full time ₹ 9,00,000 - ₹ 12,00,000 per yearSite Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Principal Site Reliability Engineer
2 days ago
Hyderabad, Telangana, India Cubic Corporation Full timeJob DescriptionBusiness Unit:Cubic Transportation SystemsCompany Details:When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people's lives easier by simplifying their daily journeys, and defense capabilities to help promote mission success and safety for those who serve their nation. Led...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Talent Worx Full timeTalent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services.Your work will involve both software engineering and systems operations as you strive to improve customer experiences and operational...
-
Site Reliability Engineer
4 days ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...
-
SRE(Site Reliability Engineer)
3 days ago
Hyderabad, Telangana, India Talent Worx Full time ₹ 15,00,000 - ₹ 20,00,000 per yearSRE (Site Reliability Engineer)Talent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services. Your work will involve both software engineering and systems operations as you strive to improve...