Manager, Site Reliability Engineering
5 days ago
Veeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running. Join us as we move forward together, growing, learning, and making a real impact for some of the world's biggest brands. The future of data resilience is here - go fearlessly forward with us.
About the Role
Veeam is expanding its global Site Reliability Engineering (SRE) organization to support the Veeam Data Cloud. As an SRE Manager, you will report to our Global Director of SRE and will build and lead a high-performing team that partners with product, platform, and security engineering to make our systems reliable, scalable, and observable from the ground up. You'll collaborate with peer engineering leaders to embed reliability into service roadmaps, and you'll represent your team in global SRE planning and delivery of cross-cutting reliability initiatives across all VDC services.
You'll drive adoption of SRE principles (SLIs/SLOs/error budgets, toil reduction, blameless learning) and operate a healthy, daytime follow-the-sun on-call model in partnership with our other regions. You will lead your team to make code contributions leading to improvements in the overall operability, reliability, resilience, and security of the codebase(s) we support.
What You'll DoPeople & Team Leadership
- Hire, onboard, and grow your SRE team; coach career development and performance.
- Foster a psychologically safe, blameless culture that favors learning over blame and emphasizes engineering over firefighting.
- Ensure a sustainable operational coverage; monitor on-call health and workload.
- Track and cap toil so engineers spend the majority of time on project work that reduces future toil.
Reliability Strategy & Governance
- Establish and operationalize SLIs/SLOs and error budgets with service owners; run reliability reviews and hold teams accountable to outcomes.
- Define reliability standards, runbooks, readiness checklists, and alerting patterns (including SLO-based alerting).
- Partner with product/EMs to align reliability work with service goals and customer experience, not as a gate but as an enabler
Operations & Incident Excellence
- Ensure incident response readiness; lead/coordinate major incidents; drive fast, high-quality postmortems and systemic fixes.
- Measure MTTR, change failure rate, SLO posture, and repeat-incident reduction; publish learning broadly.
- Engineering & Automation
- Lead software-first reliability investments: observability, deployment safety (canary/blue-green), resilience testing/chaos, and self-service guardrails.
- Drive platform improvements (IaC, CI/CD, Kubernetes) and internal tools that scale operations and improve developer experience.
Required
- 7+ years in Software, Platform, and/or Reliability Engineering with 2+ years managing engineers.
- Demonstrable experience leading engineering teams to predictably deliver outcomes
- Experience leading cross-functional initiatives collaboratively with peers through influence.
- Experience with public cloud (Azure preferred), Kubernetes, IaC (Terraform, Pulumi), CI/CD (Github Actions, ArgoCD, Azure DevOps), and observability (OpenTelemetry, Elastic, Datadog, Prometheus, Grafana).
- Coding background with experience improving service reliability.
- Hands-on incident management and postmortem practice; excellent cross-geo communication.
- Willingness to participate in an on-call rotation (typically during daytime hours, including weekends/holidays)
Preferred
- Demonstrated success leading SLO/error-budget adoption and reliability programs for cloud services.
- Experience operating a multi-region, follow-the-sun on-call model.
- Background in chaos/resilience/performance testing and release validation.
- Track record building or scaling SRE teams and influencing org-wide standards.
- Familiarity with compliance frameworks common to SaaS.
What You'll Get
- 18 paid vacation days, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
- Private medical coverage for you and up to four dependents
- Life, accident, and disability insurance with enhanced coverage
- Annual flexible wellbeing allowance for physical and mental wellness
- Free confidential counselling and coaching via Employee Assistance Program (EAP), including legal and financial advice
- Meal, fuel, and transportation benefits based on work arrangement
- Daycare reimbursement and safe cab facility for eligible employees
- Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O'Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
#LI-IU
Veeam Software is an equal opportunity employer and does not tolerate discrimination in any form on the basis of race, color, religion, gender, age, national origin, citizenship, disability, veteran status or any other classification protected by federal, state or local law. All your information will be kept confidential.
Please note that any personal data collected from you during the recruitment process will be processed in accordance with our Recruiting Privacy Notice.
The Privacy Notice sets out the basis on which the personal data collected from you, or that you provide to us, will be processed by us in connection with our recruitment processes.
By applying for this position, you consent to the processing of your personal data in accordance with our Recruiting Privacy Notice.
By submitting your application, you acknowledge that the information provided in your job application and any supporting documents is complete and accurate to the best of your knowledge. Any misrepresentation, omission, or falsification of information may result in disqualification from consideration for employment or, if discovered after employment begins, termination of employment.
-
BA4 Site Reliability Engineer
1 week ago
Pune, Maharashtra, India Barclays Investment Bank Full timeCompany DescriptionBarclays Investment Bank provides innovative financial solutions to support clients with their funding, financing, strategic, and risk management needs across various sectors and global markets. The bank operates through Investment Banking, International Corporate Banking, Global Markets, and Research divisions, serving a diverse range of...
-
Site Reliability Engineer
7 days ago
Pune, Maharashtra, India Jefferies Financial Group Full timeDescriptionPosition Title: Site Reliable Engineer (SRE) for Equity Trading PlatformJob DescriptionJefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application.As part of the wider...
-
Site Reliability Engineer
4 hours ago
Pune, Maharashtra, India Digital Twin Full timeJob Description: · Intangles Lab is looking for a hands-on Site Reliability Engineer from FinTech background to manage large 24×7 Cloud Operations. · Looking for a Site Reliability Engineer with 2+ years of experience, having hands-on with the following technologies/skillset: Must-Required Skills: · AWS Cloud (Advanced): Certification is...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India SingleStore Full timePosition OverviewSingleStore is seeking a Site Reliability Engineer to help optimize and scale our managed service offering across all three major cloud providers. In this role, you will be at the intersection of leading technology trends – A highly performant distributed database, managed by Kubernetes, running in the cloud. This is a great opportunity to...
-
Site Reliability Engineer
2 days ago
Pune, Maharashtra, India UBS Full timeYour roleWe are seeking a highly experienced Site Reliability Engineer (SRE) to join our technology team in a mission-critical financial environment. This role is ideal for someone who has a proven track record of building and operating reliable, scalable systems in regulated industries such as banking or financial services.As a Senior SRE, you will be...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India METRO Global Solution Center IN Full timeCompany DescriptionAbout us:Metro Global Solution Center (MGSC) is internal solution partner for METRO, a €29.8 Billion international wholesaler with operations in 32 countries through 625 stores & a team of 91,000 people globally. Metro operates in a further 10 countries with its Food Service Distribution (FSD) business and it is thus active in a total of...
-
Site Reliability Engineer
2 weeks ago
Pune, Maharashtra, India CIEL HR Full timeKey ResponsibilitiesSRE Implementation & Reliability EngineeringDrive end-to-end SRE strategy and implementation, ensuring systems meet reliability, scalability, and performance objectives.Establish and enforce SRE best practices including SLIs, SLOs, SLAs, error budgets, incident response processes, and postmortems.Lead efforts to automate repetitive...
-
Manager, Site Reliability Engineering
3 days ago
Pune, Maharashtra, India Veeam Software Full timeVeeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their...
-
Site Reliable Engineer
7 days ago
Pune, Maharashtra, India Jefferies Full timeJob DescriptionPosition Title: Site Reliable Engineer (SRE) for Equity Trading PlatformJob DescriptionJefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application.As part of the wider...
-
Site Reliability Engineer
1 hour ago
Pune, Maharashtra, India Zensar Full timeCandidate having skilled and proactive Site Reliability Engineer (SRE) with 10 Years experienceThe SRE will be responsible for ensuring the reliability, scalability, and performance of our systems and infrastructure.This role blends software engineering with IT operations to build fault-tolerant, self-healing systems and drive continuous improvement across...