
Site Reliability Engineer
3 weeks ago
ESSENTIAL DUTIES AND RESPONSIBILITIES
Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments Oversee and automate the team’s growing presence in AWS Contribute to core infrastructure systems development with features, bug fixes, reliability improvements, etc Platform reliability engineering of a complex single sign-on SAML/OAuth-based central authentication platform Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systems Automate deployment tasks for core product and infrastructure tools and maintain automation infrastructure Create system documentation and training materials to empower and educate our fellow team members Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues Enhance platform observability with helping create a self-healing approach to platform reliability Collaborate with engineering teams, providing product feedback and where necessary contribute code to the productREQUIRED SKILLS AND EXPERIENCE
Education and Work Experience Bachelor’s Degree in Computer Science or related field. Software engineering and task automation skills with Bash, Python, and/or Go are a must. Familiarity with the Agile software development lifecycle. Deep background with Linux systems and engineering. Highly experienced with engineering and automating on Amazon Web Services (AWS). Experience supporting web applications running on Java / Apache / Tomcat in a live production environment. Prior experience with IaC tools like Terraform/Terragrunt/Terraspace. Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions. Production-At-Scale support background in a heavily microservice-based world. Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking). Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta). Seasoned expertise around certificate technology and basic concepts of encryption. Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS. Advanced exposure to application development, web UI (design and development), JSON, application architecture. Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty. Familiarity with event store/stream-processing technologies like Kafka or AWS SQS. Understanding of Open Application Model systems such as KubeVela or Crossplane. Personal Qualities and Soft Skills You greatly prefer writing code than clicking a GUI. You enjoy teaching, being a mentor to others, and working across boundaries. Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving. Strong analytical mind with a penchant for process development and enhancement. A highly positive can-do attitude with desire for being a team player. Great communication skills and ability to explain complex technical concepts to a varied audience. Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments. Other Requirements Ability to read, write, and speak English. We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support. Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings.-
Site Reliability Engineer
3 days ago
Bengaluru, Karnataka, India AppHelix Full time ₹ 9,00,000 - ₹ 12,00,000 per yearRole DescriptionThis is a full-time on-site role located in Bengaluru for a Site Reliability Engineer. The Site Reliability Engineer will be responsible for maintaining and improving the reliability of AppHelix's systems. Daily tasks include monitoring system performance, troubleshooting issues, managing infrastructure, and supporting software development....
-
Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India FIS Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAbout the Role :Site Reliability Engineer (SRE)with deep expertise inMainframe technologies like COBOL, JCL, etc. to support and enhance ourCard Management & Payment processing functions. This role will be responsible for ensuring reliability, high availability, scalability, stability and performance of mission-critical mainframe software applications and...
-
Site Reliability Engineer
4 days ago
Bengaluru, India IntraEdge Full timeJob Title: Site Reliability Engineer (SRE) – Production Support Location: Bengaluru Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management . The ideal candidate will be responsible for maintaining the reliability, performance, and...
-
Site Reliability Engineer
6 days ago
Bengaluru, India IntraEdge Full timeJob Title: Site Reliability Engineer (SRE) – Production Support Location: Bengaluru Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management. The ideal candidate will be responsible for maintaining the reliability, performance, and...
-
Site Reliability Engineer
5 days ago
Bengaluru, India IntraEdge Full timeJob Title: Site Reliability Engineer (SRE) – Production SupportLocation: BengaluruJob Summary:We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management. The ideal candidate will be responsible for maintaining the reliability, performance, and scalability...
-
Site Reliability Engineer
5 days ago
Bengaluru, India IntraEdge Full timeJob Title: Site Reliability Engineer (SRE) – Production Support Location: Bengaluru Job Summary: We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management . The ideal candidate will be responsible for maintaining the reliability, performance, and...
-
Site Reliability Engineer
2 days ago
Bengaluru, India IntraEdge Full timeJob Title: Site Reliability Engineer (SRE) – Production SupportLocation: BengaluruJob Summary:We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in production support, DevOps practices, and cloud infrastructure management. The ideal candidate will be responsible for maintaining the reliability, performance, and scalability...
-
Site reliability engineer
2 weeks ago
Bengaluru, India HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, India HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore Location Experience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...
-
Site reliability engineer
2 weeks ago
Bengaluru, India HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 YearsJob PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.Job Responsibilities:Help build a Site Reliability...