
Senior Site Reliability Engineer
6 days ago
About the Role:
As a Senior Staff Site Reliability Engineer (SRE) at SolarWinds, you will drive the reliability, scalability, and performance of our Observability Platform. This role focuses on managing SaaS infrastructure at scale, improving system reliability through cloud-native architecture, advanced data platform operations, and automation. You will collaborate with engineering, security, and product teams to ensure operational excellence and lead a team of SREs to maintain high service standards.
Key Responsibilities:
- Lead the design, deployment, and operation of SaaS infrastructure ensuring high availability and reliability.
- Build, operate, and scale Kubernetes clusters (EKS, GKE, AKS, OpenShift) in production.
- Design and manage data platform infrastructure including Kafka, ClickHouse, and event-driven systems.
- Implement cloud-native design patterns and scalable architectures across AWS and Azure.
- Automate infrastructure provisioning and deployment using Terraform, Helm, ArgoCD, CloudFormation, and other Infrastructure as Code (IaC) tools.
- Maintain monitoring, logging, and observability systems using Prometheus, Grafana, Datadog, CloudWatch, ELK/Opensearch, and OTel/Jaeger.
- Develop and maintain disaster recovery plans and high availability strategies.
- Lead incident response, conduct blameless postmortems, and implement preventive measures.
- Mentor and guide SRE and DevOps engineers to improve team efficiency and adherence to best practices.
- Collaborate cross-functionally with engineering, product, and security teams to optimize system performance, reliability, and cost efficiency.
MUST HAVE:
- 13+ years in SRE, DevOps, Platform Engineering, or equivalent roles.
- 8+ years in SaaS infrastructure management, cloud-native system design, and production operations.
- 5+ years managing Kubernetes clusters at scale in production environments.
- Hands-on experience with data platforms: Kafka, ClickHouse, or similar.
- Strong programming/scripting in Python, Go, Bash, or equivalent.
- Infrastructure automation using Terraform, Helm, ArgoCD, CloudFormation.
- Experience with CI/CD pipelines, GitOps, and deployment automation.
- Expertise in observability: monitoring, logging, tracing (Prometheus, Grafana, Datadog, CloudWatch, ELK/Opensearch, OTel/Jaeger).
- Strong understanding of disaster recovery principles and high availability architectures.
- Security operations knowledge: IAM, encryption, cloud security best practices.
- Proven leadership and mentoring experience in SRE/DevOps teams.
Preferred Qualifications:
- Experience with Karpenter or KEDA for Kubernetes autoscaling.
- Experience managing distributed SaaS services across multiple regions.
- Familiarity with FinOps or cloud cost optimization.
- Experience with protocol buffers (Buf), event-driven system optimizations, or cloud-native databases.
- Knowledge of container orchestration patterns (ECS, EKS, GKE, AKS, OpenShift).
-
Senior Site Reliability Engineer
7 days ago
Bengaluru, Karnataka, India Akamai Full timeJob Category Site Reliability Would you like to lead modernization initiatives while building a public cloud platform from scratch Would you like to own critical services in a new public cloud platform Join our IaaS Site Reliability Engineering SRE team We design develop and operate infrastructure and services that power the backbone of our...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
-
Site Reliability Engineer
6 days ago
Bengaluru, Karnataka, India WhiteLotus Talent Partners Full timeWe are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...
-
Senior Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India Josys Full time US$ 1,50,000 - US$ 2,00,000 per yearSenior Site Reliability Engineer (SRE)About JOSYSJosys, a dynamic B2B SaaS platform startup, has embarked on a mission to revolutionize IT operations globally, following an exceptional launch in Japan and securing $125 million in Series A and B funding. Our platform enables businesses to conquer the complexities of work-from-anywhere setups, rapid digital...
-
Site Reliability Engineer
15 hours ago
Bengaluru, Karnataka, India Synechron Full timeWe have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Job Role: - SRE (Senior Site Reliability Engineer)We began life in 2001 as a small, self-funded team of technology specialists. Innovative tech solutions for business We're now a leading global digital consulting firm, providing innovative technology solutions for...
-
Senior Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India LanceSoft, Inc. Full time ₹ 6,00,000 - ₹ 8,00,000 per yearRole DescriptionThis is a full-time on-site role for a Senior Site Reliability Engineer based in Bangalore/Chennai/Pune. The Senior Site Reliability Engineer will be responsible for maintaining and enhancing the reliability and performance of the company's IT infrastructure & Development. Daily tasks include troubleshooting system issues, ensuring system...
-
Senior Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India CloudHire Full timeJob SummaryThe Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical mentorship, and...
-
Senior Site Reliability Engineer
4 days ago
Bengaluru, Karnataka, India beBeeSiteReliability Full time ₹ 20,00,000 - ₹ 30,00,000As a senior site reliability engineer, you will play a critical role in ensuring the stability and scalability of financial platforms.Key Responsibilities:Ensure defined SLAs, SLOs, and SLIs are met for performance, reliability, and uptime.Build automation for deployments, monitoring, scaling, and self-healing capabilities to reduce manual effort and...
-
Senior Site Reliability Engineer
16 hours ago
Bengaluru, Karnataka, India Procore Full time ₹ 5,00,000 - ₹ 8,00,000 per yearJob DescriptionWe're looking for a Senior Site Reliability Engineer to join Procore's Product & Technology Team. Procore software solutions aim to improve the lives of everyone in construction and the people within Product & Technology are the driving force behind our innovative, top-rated global platform. We're a customer-centric group that encompasses...
-
Senior Site Reliability Engineer
1 day ago
Bengaluru, Karnataka, India Procore Technologies Full time ₹ 1,04,000 - ₹ 1,30,878 per yearJob Description We're looking for a Senior Site Reliability Engineer to join Procore's Product & Technology Team. Procore software solutions aim to improve the lives of everyone in construction and the people within Product & Technology are the driving force behind our innovative, top-rated global platform. We're a customer-centric group that encompasses...