Senior Site Reliability Engineering Manager

4 weeks ago


Bengaluru, Karnataka, India Seven N Half Full time

Job Details - Sr. Engineering Manager - SRE

About Us:

The Super-app is a future-ready company that focuses on creating consumer-centric, high-engagement digital products. By creating a holistic presence across various touchpoints, we aim to be the trusted partner of every consumer and delight them by powering a rewarding life. The company's debut offering is a super-app that provides an integrated rewards experience across various consumer categories like groceries, fashion and electronics, travel and hospitality, health and fitness, entertainment, and financial services on a single platform.

Our Culture:

We cultivate a culture of innovation, inclusion for all employees and respect their individual strengths, views, and experiences. We thrive on the diversity of our talent in all forms and see it as a strength in building high performance teams across brands. As we rewrite commerce in India, change is the only constant in our day to day lives.

Role Overview:

We are looking for a Sr. Engineering Manager - SRE to oversee the stability, scalability, and delivery of our production environment, leveraging software engineering principles and automation to improve cloud infrastructure management and reduce operational costs. This role will play a key part in transitioning from manual processes to automated solutions by leading our current DevOps teams:

  1. Cloud Infra Lifecycle Management Team: Focused on automated provisioning, capacity planning, and maintenance across all cloud platforms for production applications.
  2. Cloud Infra Support Team: Responsible for supporting internal users with production and development environment requests, with a long-term goal of eliminating manual intervention through automation.

This role is ideal for a leader with a deep understanding of Azure cloud environments, SRE best practices, and a strong background in building automation-first operational models.

Key Responsibilities:

Stability, Scalability & Availability:

  • Lead the design and implementation of strategies to ensure high availability, reliability, and performance of production systems.
  • Apply lifecycle management techniques, including monitoring, capacity planning, and automated scaling, to cloud environments.
  • Establish Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical applications.

Cloud Lifecycle Management:

  • Oversee the Cloud Infra Lifecycle Management Team to build scalable, automated cloud provisioning workflows and optimize capacity.
  • Implement infrastructure-as-code (IaC) practices using tools like Terraform, PowerShell, and Azure Resource Manager (ARM) templates.
  • Ensure efficient cloud resource utilization and cost management strategies.

Cloud Support Operations:

  • Manage the Cloud Infra Support Team responsible for handling internal user requests related to production and development environments.
  • Develop efficient workflows for incident response and request resolution, with automation as the default approach.
  • Work towards eliminating the need for manual support teams by creating self-service solutions for internal users.

Automation & Transformation:

  • Lead the transition of manual processes to cloud automation through training, upskilling, and process reengineering.
  • Champion the use of automation to handle repetitive operational tasks, including monitoring, remediation, and deployments.
  • Foster a "first principles thinking" culture focused on engineering excellence and process simplification.

Monitoring & Incident Response:

  • Build robust monitoring systems using Azure Monitor, Log Analytics, and Application Insights for proactive performance management.
  • Oversee incident response processes, ensuring rapid recovery and root cause analysis for production disruptions.
  • Implement disaster recovery and high-availability strategies across environments.

Security & Compliance:

  • Ensure all environments follow cloud security best practices, regulatory compliance, and corporate governance policies.
  • Manage identity and access controls, network security, and risk mitigation strategies.

Continuous Improvement:

  • Drive ongoing improvements in system resilience, operational efficiency, and service quality through automation and best practices.
  • Conduct regular performance reviews and capacity planning exercises to maintain optimal system health.

Team Leadership & Development:

  • Provide coaching and mentorship to the SRE team, fostering a culture of continuous learning and technical excellence.
  • Lead efforts to upskill the team in cloud scripting, automation development, and site reliability best practices.

Reporting & Metrics:

  • Maintain detailed operational documentation and generate regular reports on system performance, reliability improvements, and cost efficiency efforts.

Basic Qualifications:

  • 10+ years of experience in cloud operations or SRE, with a strong focus on Azure environments.
  • Extensive experience in managing and optimizing Azure services like Virtual Machines, App Services, SQL Database, Networking, and Storage.
  • Hands-on expertise with cloud automation and IaC tools (Terraform, PowerShell, ARM templates, or Azure Automation).
  • Strong understanding of SRE principles, including error budgets, SLOs, SLIs, and incident management practices.
  • Proficiency with Azure DevOps and CI/CD pipeline management.
  • Expertise in cloud cost management and optimization.
  • Familiarity with monitoring, logging, and observability tools (e.g., Azure Monitor, Log Analytics, Security Centre).
  • Knowledge of Azure security practices, including identity and access management, firewalls, and compliance requirements.

Preferred Qualifications:

  • Microsoft Certified: Azure Solutions Architect Expert or Azure Administrator Associate.
  • Experience managing hybrid or multi-cloud environments.
  • Experience implementing self-service workflows and internal user support automation.

Soft Skills:

  • Strong leadership and team management abilities.
  • Excellent communication and client engagement skills.
  • Analytical mindset with a proactive approach to problem-solving.
  • Ability to handle high-pressure situations with professionalism.


  • Bengaluru, Karnataka, India Josys Full time

    Senior Site Reliability Engineer (SRE) About JOSYS : Josys, a dynamic B2B SaaS platform startup, has embarked on a mission to revolutionize IT operations globally, following an exceptional launch in Japan and securing $125 million in Series A and B funding. Our platform enables businesses to conquer the complexities of work-from-anywhere setups, rapid...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high...


  • Bengaluru, Karnataka, India Aerospike Full time

    Job DescriptionAbout AerospikeAt Aerospike, we dream big. Our focus is helping companies tackle seemingly insurmountable problems and doing whats never been done before. That is why we developed the world&aposs leading real-time data platform that powers mission-critical applications at the world&aposs most innovative, category-disrupting companies....


  • Bengaluru, Karnataka, India SolarWinds Full time

    About the Role:As a Senior Staff Site Reliability Engineer (SRE) at SolarWinds, you will drive the reliability, scalability, and performance of our Observability Platform. This role focuses on managing SaaS infrastructure at scale, improving system reliability through cloud-native architecture, advanced data platform operations, and automation. You will...


  • Bengaluru, Karnataka, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years.Synechron – BangaloreJob Role: - SRE (Senior Site Reliability Engineer)Job Location: - BangaloreNotice Period: Within 30daysAbout SynechronWe began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to 14,500+...


  • Bengaluru, Karnataka, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to...


  • Bengaluru, Karnataka, India Synechron Full time

    We have immediate opportunity for SRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron – Bangalore Job Role: - SRE (Senior Site Reliability Engineer) Job Location: - Bangalore Notice Period: Within 30days About Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we've grown our organization to...


  • Bengaluru, Karnataka, India Pearson Full time

    The Senior Site Reliability Engineer plays a crucial role within a small team ensuring our critical services are secure reliable cost-effective performant and operationally excellent This position demands a versatile professional who can contribute across development system operations resiliency testing security hardening and performance...


  • Bengaluru, Karnataka, India WhiteLotus Talent Partners Full time

    Our client is looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes.In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...


  • Bengaluru, Karnataka, India Okta Full time US$ 1,50,000 - US$ 2,00,000 per year

    Get to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...