Senior site reliability engineer

7 days ago


Bengaluru, India SolarWinds Full time

About the Role: As a Senior Staff Site Reliability Engineer (SRE) at Solar Winds, you will drive the reliability, scalability, and performance of our Observability Platform. This role focuses on managing Saa S infrastructure at scale, improving system reliability through cloud-native architecture, advanced data platform operations, and automation. You will collaborate with engineering, security, and product teams to ensure operational excellence and lead a team of SREs to maintain high service standards.Key Responsibilities:Lead the design, deployment, and operation of Saa S infrastructure ensuring high availability and reliability.Build, operate, and scale Kubernetes clusters (EKS, GKE, AKS, Open Shift) in production.Design and manage data platform infrastructure including Kafka, Click House, and event-driven systems.Implement cloud-native design patterns and scalable architectures across AWS and Azure.Automate infrastructure provisioning and deployment using Terraform, Helm, Argo CD, Cloud Formation, and other Infrastructure as Code (Ia C) tools.Maintain monitoring, logging, and observability systems using Prometheus, Grafana, Datadog, Cloud Watch, ELK/Opensearch, and OTel/Jaeger.Develop and maintain disaster recovery plans and high availability strategies.Lead incident response, conduct blameless postmortems, and implement preventive measures.Mentor and guide SRE and Dev Ops engineers to improve team efficiency and adherence to best practices.Collaborate cross-functionally with engineering, product, and security teams to optimize system performance, reliability, and cost efficiency.MUST HAVE:13+ years in SRE, Dev Ops, Platform Engineering, or equivalent roles.8+ years in Saa S infrastructure management, cloud-native system design, and production operations.5+ years managing Kubernetes clusters at scale in production environments.Hands-on experience with data platforms: Kafka, Click House, or similar.Strong programming/scripting in Python, Go, Bash, or equivalent.Infrastructure automation using Terraform, Helm, Argo CD, Cloud Formation.Experience with CI/CD pipelines, Git Ops, and deployment automation.Expertise in observability: monitoring, logging, tracing (Prometheus, Grafana, Datadog, Cloud Watch, ELK/Opensearch, OTel/Jaeger).Strong understanding of disaster recovery principles and high availability architectures.Security operations knowledge: IAM, encryption, cloud security best practices.Proven leadership and mentoring experience in SRE/Dev Ops teams.Preferred Qualifications:Experience with Karpenter or KEDA for Kubernetes autoscaling.Experience managing distributed Saa S services across multiple regions.Familiarity with Fin Ops or cloud cost optimization.Experience with protocol buffers (Buf), event-driven system optimizations, or cloud-native databases.Knowledge of container orchestration patterns (ECS, EKS, GKE, AKS, Open Shift).



  • Bengaluru, Karnataka, India Akamai Full time

    Job Category Site Reliability Would you like to lead modernization initiatives while building a public cloud platform from scratch Would you like to own critical services in a new public cloud platform Join our IaaS Site Reliability Engineering SRE team We design develop and operate infrastructure and services that power the backbone of our...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Job Role: - SRE (Senior Site Reliability Engineer)We began life in 2001 as a small, self-funded team of technology specialists. Innovative tech solutions for business We're now a leading global digital consulting firm, providing innovative technology solutions for...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    We are looking for aL0 and L1 Site Reliability Engineer (SRE) Supportto join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered byOpenStackandKubernetes. In this role, you will focus onmonitoring,basic troubleshooting, andincident response, helping to maintain high system availability,...


  • Bengaluru, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years.Synechron – BangaloreJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - BangaloreNotice Period: Within 30daysAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to...


  • Bengaluru, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years.Synechron – BangaloreJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - BangaloreNotice Period: Within 30daysAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to...


  • Bengaluru, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years.Synechron – BangaloreJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - BangaloreNotice Period: Within 30daysAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+...