Sr. Staff Site Reliability Engineer

2 weeks ago

Bengaluru, India SolarWinds Full time

At SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, Partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions. The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We’re looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you’re looking to build your career with an exceptional team, you’ve come to the right place. Join SolarWinds and grow with us About the Role: As a Senior Staff Site Reliability Engineer, you will play a pivotal role in driving reliability and performance improvements across the SolarWinds Observability Platform. You will work closely with cross-functional engineering teams to manage and reduce SaaS backlogs, ensuring that our platform scales effectively while maintaining the highest standards of reliability and performance. Your ability to drive initiatives, provide technical leadership, and optimize complex systems will be key to our success. This role demands deep technical expertise, a collaborative mindset, and the ability to mentor a high-performing team of engineers. You will be responsible for driving technical initiatives, overseeing incident response, and improving our platform’s infrastructure while focusing on the integration of emerging technologies such as ClickHouse, Kafka, Karpenter, and Buf. Key Responsibilities: Lead and Drive Initiatives: Own and lead strategic initiatives to improve the reliability, scalability, and performance of the SolarWinds Observability Platform, with a strong focus on reducing SaaS backlogs. SaaS Backlog Management: Collaborate with cross-functional teams to identify, prioritize, and address outstanding backlog items, including incidents, infrastructure improvements, performance optimization, and automation. Automation & Observability: Lead the development of automation strategies and observability tools to improve platform monitoring, reduce incidents, and enhance performance insights across the infrastructure. Incident Response & Postmortems: Lead response efforts for production incidents, conducting thorough postmortems, driving continuous improvement initiatives, and ensuring the team learns from each incident. Platform Engineering Leadership: Drive initiatives related to platform engineering and scale infrastructure systems, ensuring they meet the reliability and performance standards necessary for the SolarWinds Observability Platform. Mentorship & Team Leadership: Mentor and provide technical guidance to the Site Reliability Engineering (SRE) team, helping them grow their skills and driving a culture of continuous learning and collaboration. Collaboration & Cross-Functional Engagement: Collaborate closely with engineering, security, and product teams to ensure the seamless integration of new technologies and systems, improving platform reliability and scalability. Ideal Candidate Attributes: Strong Leadership Skills: Proven ability to drive initiatives, manage SaaS backlogs, and lead cross-functional teams to successful outcomes. Collaborative Mindset: Comfortable working with diverse teams across different functions to solve complex problems and build scalable, high-performance systems. Customer-Focused: A strong customer orientation, with the ability to translate technical challenges into business solutions. Excellent Communication: Strong interpersonal and communication skills to effectively engage with both technical and non-technical stakeholders. Problem-Solving & Ownership: A collaborative problem solver with a strong bias for ownership and decisive action. Qualifications: 13+ years of experience in Site Reliability Engineering, Platform Engineering, or related roles, with extensive experience managing SaaS environments. 8+ years of experience designing, building, and maintaining AWS/Azure infrastructure, using Terraform and automation tools. 5+ years of experience building, running, and scaling Kubernetes clusters in production environments. Experience with Observability tools (e.g., monitoring, logging, tracing, metrics) and practices for high-performance systems. Strong expertise with Kafka for real-time data processing, ClickHouse for OLAP workloads, and GitOps CI/CD processes. Familiarity with Karpenter for Kubernetes autoscaling, and Buf for managing Protocol Buffers at scale is a plus. Programming experience in Python, Go (Golang), and Bash. Security Operations Experience: Knowledge of security best practices for cloud-native environments, including encryption, key management, and security policies. Mentorship experience: Demonstrated success in mentoring and growing technical teams, fostering a culture of collaboration and continuous learning.

Senior Site Reliability Engineer

2 weeks ago

Bengaluru, India SolarWinds Full time

Your Role :We are seeking a Sr. Site Reliability Engineer (Infrastructure & Site Reliability Engineering) with experience in AWS, AZURE, Kubernetes and GitOps to work with our Site Reliability Engineering (SRE) team. The successful candidate will understand SRE practices and have a track record of implementing high-quality site reliability engineering...
Sr Site Reliability Engineer

2 days ago

Bengaluru, Karnataka, India Visa Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Company Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...
Staff Site Reliability Engineer

1 week ago

Bengaluru, Karnataka, India Okta Full time ₹ 8,00,000 - ₹ 24,00,000 per year

Join our team Were building a world where Identity belongs to you.Oktas Workforce Identity Cloud Security Engineering group is looking for a Staff Site Reliability Engineer with a passion for DevSecOps , Infrastructure Security , and SRE . Join a team that is not just building solutions but redefining the standards for cloud security. If you have a proven...
Staff Site Reliability Engineer

2 weeks ago

Bengaluru, India Saviynt Full time

Staff Site Reliability EngineerSaviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping...
Staff Site Reliability Engineer

2 weeks ago

Bengaluru East, Karnataka, India Visa Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network,...
Sr. Site Reliability Engineer

6 days ago

Bengaluru, India Koch Full time

Your Job Koch Capabilities is seeking an experienced a Site Reliability Engineering (SRE) capability from the ground up—modernizing legacy monitoring tools and practices to drive a culture of reliability, accountability, and automation. If you are passionate about designing resilient systems, influencing strategic decisions, and mentoring the next...
Movius - Senior Staff Site Reliability Engineer - DevOps

4 weeks ago

Bengaluru, India Movius Full time

About the Role :We are looking for a highly experienced Senior Staff Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will bring deep technical expertise in DevOps, automation, and large-scale distributed systems, with a strong understanding of cloud operations and CI/CD frameworks. Experience in the telecom domain will be an...
Sr Site Reliability Engineer

2 days ago

Bengaluru, India Shell Recharge Solutions Full time

Senior Site Reliability EngineerShell Recharge Solutions is a leader in delivering the new electric mobility future through innovative software, infrastructure, and professional services that empower utilities, cities, fleets, transit agencies, and automakers to deploy EV charging infrastructure at scale. Our technology is connecting EV infrastructure...
Sr Site Reliability Engineer

2 days ago

Bengaluru, India Shell Recharge Solutions Full time

Senior Site Reliability EngineerShell Recharge Solutions is a leader in delivering the new electric mobility future through innovative software, infrastructure, and professional services that empower utilities, cities, fleets, transit agencies, and automakers to deploy EV charging infrastructure at scale. Our technology is connecting EV infrastructure...
Site Reliability Engineering

1 week ago

Bengaluru, Karnataka, India Thakral One Full time US$ 60,000 - US$ 1,20,000 per year

Company DescriptionThakral One, headquartered in Singapore, is a technology consulting and services company with a strong presence across Asia. The company specializes in technology-driven consulting, custom solution development, data analytics, and leveraging cloud capabilities to deliver enhanced decision support and practical outcomes. Collaborating...

Americas

Europe

Asia / Oceania

Africa

Sr. Staff Site Reliability Engineer