Site Reliability Engineering Manager

6 days ago


Mumbai, Maharashtra, India Fynd (Shopsense Retail Technologies Ltd.) Full time
About Fynd

Fynd is a leading omnichannel platform and tech company specializing in retail tech and innovative products in AI, ML, big data ops, gaming, crypto, image editing, and the learning space. Founded in 2012 by three IIT Bombay alumni, Fynd is headquartered in Mumbai and manages over 1000 brands, 10k stores, and 23k+ pin codes.

Role Overview

As a Site Reliability Engineering Manager at Fynd, you will lead a team of Site Reliability Engineers to ensure the reliability, scalability, and performance of production systems. Your responsibilities will include establishing and evolving SRE practices, incident management, automation, and improving system efficiency. You will also lead efforts to drive system health improvements and collaborate across teams to implement SRE best practices.

Key Responsibilities
  • Lead, mentor, and manage a team of 10-30 Site Reliability Engineers, ensuring operational efficiency, system reliability, and scalability.
  • Define and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure system performance and availability.
  • Drive incident management processes by quickly mitigating production issues, leading post-incident reviews, and implementing long-term solutions.
  • Automate repetitive tasks to reduce manual interventions, using scripting languages like Python or Go, and infrastructure automation tools like Terraform, Ansible, or CloudFormation.
  • Ensure observability through best-in-class monitoring and alerting using tools such as Prometheus, Grafana, and New Relic, making sure all systems are adequately monitored.
  • Collaborate closely with engineering, product, and platform teams to ensure reliable feature releases and system updates.
  • Architect, implement, and manage scalable, highly available systems using technologies like Kubernetes, Docker, Kafka, and serverless computing (e.g., AWS Lambda).
  • Conduct capacity planning, ensuring the infrastructure can scale effectively while optimizing for cost and performance.
  • Prepare and lead Game Days and other reliability training sessions to ensure the team can handle real-world incident scenarios effectively.
  • Drive continuous improvement in reliability processes, adopting the latest industry practices to align with agile methodologies.
Requirements
  • 7+ years of experience in Site Reliability Engineering, DevOps, or software engineering roles, with 3+ years in a leadership or Tech Lead role.
  • Strong experience with Kubernetes, Docker, and serverless technologies (e.g., AWS Lambda) for managing large-scale, cloud-based infrastructure.
  • Expertise in infrastructure as code (IaC) tools like Terraform, Ansible, or CloudFormation to automate deployments and manage cloud infrastructure.
  • Proficiency in coding/scripting languages like Python or Go for building automation tools and scripts.
  • Hands-on experience with monitoring and alerting tools such as Prometheus, Grafana, New Relic, or similar.
  • Familiarity with real-time distributed systems like Kafka and gRPC.
  • Strong understanding of cloud platforms such as AWS, Google Cloud Platform (GCP), or hybrid cloud environments.
  • Basic knowledge of Linux environments (Red Hat, CentOS) and experience with performance tuning and troubleshooting in production.
  • Solid understanding of SLI, SLO, and error budgeting practices for system reliability.
  • Previous experience with incident management and driving root cause analysis, remediation, and prevention strategies.
What We Offer

Growth

Growth knows no bounds, as we foster an environment that encourages creativity, embraces challenges, and cultivates a culture of continuous expansion. We are looking at new product lines, international markets, and brilliant people to grow even further. We teach, groom, and nurture our people to become leaders. You get to grow with a company that is growing exponentially.

Flex University: We help you upskill by organizing in-house courses on important subjects

Learning Wallet: You can also do an external course to upskill and grow, we reimburse it for you.

Culture

Community and Team building activities

Host weekly, quarterly, and annual events/parties.

Wellness

Mediclaim policy for you + parents + spouse + kids

Experienced therapist for better mental health, improve productivity & work-life balance

We work 5 days from the office and we make sure people have everything they need:

Free meals

Snacks, goodies & a lot of fun culture

Please reach out to me at mangeshgaikwad@gofynd.com to share your profile for consideration.



  • Mumbai, Maharashtra, India Antal Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure. You will work closely with our engineering teams to design, implement, and operate...


  • Mumbai, Maharashtra, India antal international network Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Antal International Network. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and efficiency of our software solutions.Key Responsibilities:Monitor production environment...


  • Mumbai, Maharashtra, India Fynd (Shopsense Retail Technologies Ltd.) Full time

    About FyndFynd is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming + crypto, image editing, and the learning space. Founded in 2012 by 3 IIT Bombay alumni, Fynd is headquartered in Mumbai and has 1000+ brands under management, more than 10k stores, and servicing...


  • Mumbai, Maharashtra, India Fynd (Shopsense Retail Technologies Ltd.) Full time

    About FyndFynd is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming + crypto, image editing, and the learning space. Founded in 2012 by 3 IIT Bombay alumni: Farooq Adam, Harsh Shah, and Sreeraman MG. We are headquartered in Mumbai and have 1000+ brands under...


  • Navi Mumbai, Maharashtra, India Cyber Sphere LLC Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled and experienced Site Reliability Engineer to join our team at Cyber Sphere LLC.Job Summary:The successful candidate will play a crucial role in ensuring the reliability, scalability, and performance of our Azure AI Services platform.Key Responsibilities:Design, deploy, and maintain a highly...


  • Mumbai, Maharashtra, India FatakPay Digital Pvt. Ltd. Full time

    Job Summary :We are looking for a Site Reliability Engineer to help ensure the reliability, scalability, and performance of our systems. You will focus on monitoring, incident management, and continuous improvement of our :- Monitor system health and uptime using industry-standard tools.- Design and implement incident management processes.- Optimize system...


  • Mumbai, Maharashtra, India Session AI Full time

    Job Title: Site Reliability Engineer IIWe are seeking a highly skilled Site Reliability Engineer II to join our team at Session AI. As a key member of our Site Reliability Engineering Group, you will play a vital role in ensuring the seamless operation of our Cloud platform.Key Responsibilities:Design and implement solutions to enhance the availability,...


  • Mumbai, Maharashtra, India RELX India (Pvt) Ltd Risk div Company Full time

    About the RoleWe are seeking a seasoned Site Reliability Engineer with expertise in containerization and orchestration to join our team at RELX India (Pvt) Ltd Risk div Company. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining highly available and scalable container-based infrastructure using...


  • Mumbai, Maharashtra, India M&G Full time

    About the RoleWe are seeking a highly skilled Cloud Site Reliability Engineer to join our team at M&G Global Services. As a Cloud Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain cloud-based systems and infrastructure to...


  • Mumbai, Maharashtra, India Cyber Sphere LLC Full time

    Site Reliability Engineer (SRE) to join our team. Qualifications :- 4+ years of Software Engineering experience- BS Engineering/Computer Science or equivalent experience requiredResponsibilities :- Design, deploy, and maintain a highly available and scalable data infrastructure on Azure open ai , databases and event driven services- Monitor and optimize the...


  • Mumbai/Navi Mumbai/Maharashtra, Maharashtra, India Capabiliq IT Services Private Limited Full time

    Responsibilities :- Define processes for the DevOps program and align to best practice standards- Support of Product delivery teams integrating into existing pipelines and platforms.- Plan for and manage operational resilience for network and application while minimizing the effect on the business- Develop and extend DevOps tooling and automation efforts...

  • Site Manager

    19 hours ago


    Mumbai, Maharashtra, India John Cockerill Full time

    Job Title: Site ManagerJob Summary:We are seeking a highly skilled Site Manager to join our team at John Cockerill India Limited. The successful candidate will be responsible for organizing and managing project sites to ensure activities are performed safely, within budget, and on time.Key Responsibilities:Timely and regular reporting to Head officeEnsure...


  • Mumbai, Maharashtra, India antal international network Full time

    Title : Site Reliability EngineerMy client is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming crypto, image editing and learning space. Roles & Responsibility : What will you do?- Run the production environment by monitoring availability and taking a holistic...


  • Mumbai, Maharashtra, India Antal Full time

    Job Description :A major player in the tech industry, which specializes in retail technology, AI, ML, and big data, is seeking new talent. Established by alumni from a top engineering institute, this organization manages a vast network of brands and stores. Headquartered in Mumbai, it is recognized for its innovation and expertise across multiple tech...

  • Site Manager

    6 days ago


    Mumbai, Maharashtra, India John Cockerill Full time

    Job Title: Site ManagerJob Summary:We are seeking a highly skilled and experienced Site Manager to join our team at John Cockerill India Limited. As a Site Manager, you will be responsible for overseeing the project site, ensuring activities are performed safely, within budget, and on time. You will liaise with various stakeholders to ensure project site...

  • Site Manager

    6 days ago


    Mumbai, Maharashtra, India John Cockerill Full time

    Job Purpose:To oversee the project site, ensuring activities are conducted safely, within budget, and on schedule.Key Responsibilities:Timely and regular reporting to the head officeImplementing safety protocols at the siteReviewing technical contracts to understand project requirements and raise queries as necessaryStudying documents to gain a deeper...


  • Mumbai, Maharashtra, India Session AI Full time

    Job DescriptionWe are seeking a skilled Cloud Reliability Engineer to join our Site Reliability Engineering Group at Session AI. As a key member of our team, you will be responsible for ensuring the seamless operation of our Cloud platform, with a focus on availability, performance, and stability.The ideal candidate will have over five years of experience...


  • Mumbai, Maharashtra, India Toughcons Nirman Full time

    Job Title: Junior Site EngineerWe are seeking a highly skilled and motivated Junior Site Engineer to join our team at Toughcons Nirman Pvt. Ltd. in Mumbai. As a Junior Site Engineer, you will play a crucial role in the successful execution of our redevelopment projects.Key Responsibilities:Site Supervision: Assist in overseeing daily construction activities,...


  • Mumbai, Maharashtra, India Toughcons Nirman Full time

    Job Title: Junior Site EngineerJob Summary:We are seeking a skilled Junior Site Engineer to support the on-site management of our redevelopment projects. This role is ideal for someone with a strong background in construction and site management.Key Responsibilities:Site Supervision: Assist in overseeing daily construction activities, ensuring compliance...

  • Site Engineer

    17 hours ago


    Mumbai, Maharashtra, India Hitachi Hi-Rel Power Electronics Full time

    Job Title: Site Engineer / Service EngineerDepartment: Customer SupportLocation: Mumbai (Maharashtra)Experience: 3 to 5 years of relevant experience as site / service engineerCompany: Hitachi Hi-Rel Power Electronics Private LimitedAbout Us:Hitachi Hi-Rel Power Electronics Pvt. Ltd. is a pioneer in power electronics, with over 3 decades of experience. We...