Site Reliability Engineering Manager
6 days ago
Fynd is a leading omnichannel platform and tech company specializing in retail tech and innovative products in AI, ML, big data ops, gaming, crypto, image editing, and the learning space. Founded in 2012 by three IIT Bombay alumni, Fynd is headquartered in Mumbai and manages over 1000 brands, 10k stores, and 23k+ pin codes.
Role OverviewAs a Site Reliability Engineering Manager at Fynd, you will lead a team of Site Reliability Engineers to ensure the reliability, scalability, and performance of production systems. Your responsibilities will include establishing and evolving SRE practices, incident management, automation, and improving system efficiency. You will also lead efforts to drive system health improvements and collaborate across teams to implement SRE best practices.
Key Responsibilities- Lead, mentor, and manage a team of 10-30 Site Reliability Engineers, ensuring operational efficiency, system reliability, and scalability.
- Define and enforce Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure system performance and availability.
- Drive incident management processes by quickly mitigating production issues, leading post-incident reviews, and implementing long-term solutions.
- Automate repetitive tasks to reduce manual interventions, using scripting languages like Python or Go, and infrastructure automation tools like Terraform, Ansible, or CloudFormation.
- Ensure observability through best-in-class monitoring and alerting using tools such as Prometheus, Grafana, and New Relic, making sure all systems are adequately monitored.
- Collaborate closely with engineering, product, and platform teams to ensure reliable feature releases and system updates.
- Architect, implement, and manage scalable, highly available systems using technologies like Kubernetes, Docker, Kafka, and serverless computing (e.g., AWS Lambda).
- Conduct capacity planning, ensuring the infrastructure can scale effectively while optimizing for cost and performance.
- Prepare and lead Game Days and other reliability training sessions to ensure the team can handle real-world incident scenarios effectively.
- Drive continuous improvement in reliability processes, adopting the latest industry practices to align with agile methodologies.
- 7+ years of experience in Site Reliability Engineering, DevOps, or software engineering roles, with 3+ years in a leadership or Tech Lead role.
- Strong experience with Kubernetes, Docker, and serverless technologies (e.g., AWS Lambda) for managing large-scale, cloud-based infrastructure.
- Expertise in infrastructure as code (IaC) tools like Terraform, Ansible, or CloudFormation to automate deployments and manage cloud infrastructure.
- Proficiency in coding/scripting languages like Python or Go for building automation tools and scripts.
- Hands-on experience with monitoring and alerting tools such as Prometheus, Grafana, New Relic, or similar.
- Familiarity with real-time distributed systems like Kafka and gRPC.
- Strong understanding of cloud platforms such as AWS, Google Cloud Platform (GCP), or hybrid cloud environments.
- Basic knowledge of Linux environments (Red Hat, CentOS) and experience with performance tuning and troubleshooting in production.
- Solid understanding of SLI, SLO, and error budgeting practices for system reliability.
- Previous experience with incident management and driving root cause analysis, remediation, and prevention strategies.
Growth
Growth knows no bounds, as we foster an environment that encourages creativity, embraces challenges, and cultivates a culture of continuous expansion. We are looking at new product lines, international markets, and brilliant people to grow even further. We teach, groom, and nurture our people to become leaders. You get to grow with a company that is growing exponentially.
Flex University: We help you upskill by organizing in-house courses on important subjects
Learning Wallet: You can also do an external course to upskill and grow, we reimburse it for you.
Culture
Community and Team building activities
Host weekly, quarterly, and annual events/parties.
Wellness
Mediclaim policy for you + parents + spouse + kids
Experienced therapist for better mental health, improve productivity & work-life balance
We work 5 days from the office and we make sure people have everything they need:
Free meals
Snacks, goodies & a lot of fun culture
Please reach out to me at mangeshgaikwad@gofynd.com to share your profile for consideration.
-
Site Reliability Engineer
6 days ago
Mumbai, Maharashtra, India Antal Full timeJob Title: Site Reliability EngineerAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure. You will work closely with our engineering teams to design, implement, and operate...
-
Site Reliability Engineer
6 days ago
Mumbai, Maharashtra, India antal international network Full timeJob Title: Site Reliability EngineerAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Antal International Network. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and efficiency of our software solutions.Key Responsibilities:Monitor production environment...
-
Site Reliability Engineering Manager
6 days ago
Mumbai, Maharashtra, India Fynd (Shopsense Retail Technologies Ltd.) Full timeAbout FyndFynd is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming + crypto, image editing, and the learning space. Founded in 2012 by 3 IIT Bombay alumni, Fynd is headquartered in Mumbai and has 1000+ brands under management, more than 10k stores, and servicing...
-
Site Reliability Engineering Manager
6 days ago
Mumbai, Maharashtra, India Fynd (Shopsense Retail Technologies Ltd.) Full timeAbout FyndFynd is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming + crypto, image editing, and the learning space. Founded in 2012 by 3 IIT Bombay alumni: Farooq Adam, Harsh Shah, and Sreeraman MG. We are headquartered in Mumbai and have 1000+ brands under...
-
Site Reliability Engineer
6 days ago
Navi Mumbai, Maharashtra, India Cyber Sphere LLC Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled and experienced Site Reliability Engineer to join our team at Cyber Sphere LLC.Job Summary:The successful candidate will play a crucial role in ensuring the reliability, scalability, and performance of our Azure AI Services platform.Key Responsibilities:Design, deploy, and maintain a highly...
-
Site Reliability Engineer
5 days ago
Mumbai, Maharashtra, India FatakPay Digital Pvt. Ltd. Full timeJob Summary :We are looking for a Site Reliability Engineer to help ensure the reliability, scalability, and performance of our systems. You will focus on monitoring, incident management, and continuous improvement of our :- Monitor system health and uptime using industry-standard tools.- Design and implement incident management processes.- Optimize system...
-
Site Reliability Engineer II
6 days ago
Mumbai, Maharashtra, India Session AI Full timeJob Title: Site Reliability Engineer IIWe are seeking a highly skilled Site Reliability Engineer II to join our team at Session AI. As a key member of our Site Reliability Engineering Group, you will play a vital role in ensuring the seamless operation of our Cloud platform.Key Responsibilities:Design and implement solutions to enhance the availability,...
-
Site Reliability Engineer
17 hours ago
Mumbai, Maharashtra, India RELX India (Pvt) Ltd Risk div Company Full timeAbout the RoleWe are seeking a seasoned Site Reliability Engineer with expertise in containerization and orchestration to join our team at RELX India (Pvt) Ltd Risk div Company. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining highly available and scalable container-based infrastructure using...
-
Cloud Site Reliability Engineer
6 days ago
Mumbai, Maharashtra, India M&G Full timeAbout the RoleWe are seeking a highly skilled Cloud Site Reliability Engineer to join our team at M&G Global Services. As a Cloud Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain cloud-based systems and infrastructure to...
-
Site Reliability Engineer
3 weeks ago
Mumbai, Maharashtra, India Cyber Sphere LLC Full timeSite Reliability Engineer (SRE) to join our team. Qualifications :- 4+ years of Software Engineering experience- BS Engineering/Computer Science or equivalent experience requiredResponsibilities :- Design, deploy, and maintain a highly available and scalable data infrastructure on Azure open ai , databases and event driven services- Monitor and optimize the...
-
Senior DevOps Engineer
3 weeks ago
Mumbai/Navi Mumbai/Maharashtra, Maharashtra, India Capabiliq IT Services Private Limited Full timeResponsibilities :- Define processes for the DevOps program and align to best practice standards- Support of Product delivery teams integrating into existing pipelines and platforms.- Plan for and manage operational resilience for network and application while minimizing the effect on the business- Develop and extend DevOps tooling and automation efforts...
-
Site Manager
19 hours ago
Mumbai, Maharashtra, India John Cockerill Full timeJob Title: Site ManagerJob Summary:We are seeking a highly skilled Site Manager to join our team at John Cockerill India Limited. The successful candidate will be responsible for organizing and managing project sites to ensure activities are performed safely, within budget, and on time.Key Responsibilities:Timely and regular reporting to Head officeEnsure...
-
Site Reliability Engineer
3 weeks ago
Mumbai, Maharashtra, India antal international network Full timeTitle : Site Reliability EngineerMy client is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming crypto, image editing and learning space. Roles & Responsibility : What will you do?- Run the production environment by monitoring availability and taking a holistic...
-
Site Reliability Engineer
3 weeks ago
Mumbai, Maharashtra, India Antal Full timeJob Description :A major player in the tech industry, which specializes in retail technology, AI, ML, and big data, is seeking new talent. Established by alumni from a top engineering institute, this organization manages a vast network of brands and stores. Headquartered in Mumbai, it is recognized for its innovation and expertise across multiple tech...
-
Site Manager
6 days ago
Mumbai, Maharashtra, India John Cockerill Full timeJob Title: Site ManagerJob Summary:We are seeking a highly skilled and experienced Site Manager to join our team at John Cockerill India Limited. As a Site Manager, you will be responsible for overseeing the project site, ensuring activities are performed safely, within budget, and on time. You will liaise with various stakeholders to ensure project site...
-
Site Manager
6 days ago
Mumbai, Maharashtra, India John Cockerill Full timeJob Purpose:To oversee the project site, ensuring activities are conducted safely, within budget, and on schedule.Key Responsibilities:Timely and regular reporting to the head officeImplementing safety protocols at the siteReviewing technical contracts to understand project requirements and raise queries as necessaryStudying documents to gain a deeper...
-
Cloud Reliability Engineer
5 days ago
Mumbai, Maharashtra, India Session AI Full timeJob DescriptionWe are seeking a skilled Cloud Reliability Engineer to join our Site Reliability Engineering Group at Session AI. As a key member of our team, you will be responsible for ensuring the seamless operation of our Cloud platform, with a focus on availability, performance, and stability.The ideal candidate will have over five years of experience...
-
Junior Site Engineer
6 days ago
Mumbai, Maharashtra, India Toughcons Nirman Full timeJob Title: Junior Site EngineerWe are seeking a highly skilled and motivated Junior Site Engineer to join our team at Toughcons Nirman Pvt. Ltd. in Mumbai. As a Junior Site Engineer, you will play a crucial role in the successful execution of our redevelopment projects.Key Responsibilities:Site Supervision: Assist in overseeing daily construction activities,...
-
Site Engineering Professional
2 days ago
Mumbai, Maharashtra, India Toughcons Nirman Full timeJob Title: Junior Site EngineerJob Summary:We are seeking a skilled Junior Site Engineer to support the on-site management of our redevelopment projects. This role is ideal for someone with a strong background in construction and site management.Key Responsibilities:Site Supervision: Assist in overseeing daily construction activities, ensuring compliance...
-
Site Engineer
17 hours ago
Mumbai, Maharashtra, India Hitachi Hi-Rel Power Electronics Full timeJob Title: Site Engineer / Service EngineerDepartment: Customer SupportLocation: Mumbai (Maharashtra)Experience: 3 to 5 years of relevant experience as site / service engineerCompany: Hitachi Hi-Rel Power Electronics Private LimitedAbout Us:Hitachi Hi-Rel Power Electronics Pvt. Ltd. is a pioneer in power electronics, with over 3 decades of experience. We...