Site Reliability Engineer

3 days ago


Mumbai, Maharashtra, India Fynd Full time ₹ 8,00,000 - ₹ 24,00,000 per year

Fynd is India's largest omnichannel platform and a multi-platform tech company specializing in retail technology and products in AI, ML, big data, image editing, and the learning space. It provides a unified platform for businesses to seamlessly manage online and offline sales, store operations, inventory, and customer engagement. Serving over 2,300 brands, Fynd is at the forefront of retail technology, transforming customer experiences and business processes across various industries.

Are you passionate about building ultra-reliable systems at scale? Join our team as a Site Reliability Engineer (SRE) and be the driving force behind our site's performance and uptime. Embrace a culture of end-to-end ownership, collaboration, and engineering excellence. In this role, you'll blend software development and systems engineering skills to ensure our platform is massively scalable, fault-tolerant, and lightning-fast. It's a discipline that combines software engineering and systems engineering to ensure the scalability, performance, and reliability of large-scale systems – exactly what's needed to delight millions of online shoppers. You'll work from our Mumbai headquarters, taking ownership of product reliability from day one and working across teams to keep our services robust and customers happy.

We are looking for a Engineer's who not only builds performant, scalable applications but also embraces AI as a development multiplier. Your core job will be building and owning web applications end-to-end, but we expect you to use tools like GitHub Copilot, Cursor, ChatGPT, or equivalent to write, debug, test, and document faster and better.

  • Use AI tools (e.g. Copilot, Cursor, GPT-4) to accelerate code generation, refactoring, testing, and documentation
  • Code with AI — Use GitHub Copilot, Cursor, or similar tools for scaffolding, debugging, and testing

What will you do at Fynd?

  • Influence technical direction by evaluating change requests, participating in architectural discussions across teams to uphold best practices and decide on appropriate technologies.
  • Lead incident response and root cause analysis to rapidly resolve issues and implement preventive measures, ensuring we never fail for the same reason twice.
  • Identify any bottleneck in current processes and build or improve tools to support incident management.
  • Go on-call, respond to automated alerts, and execute playbooks.
  • Continuously monitor and fine-tune our infrastructure using industry-standard observability tools, ensuring high performance even under heavy load.
  • Conduct rigorous load tests for critical sales events and optimise system capacity to handle peak demand seamlessly.
  • Own availability and performance for key products. Be responsible for ensuring the product's architecture, changes, incident response, and technology choices support its target availability and performance levels.
  • Remove unnecessary noise from our signals to obtain a clearer understanding of our platform and enable more effective debugging.
  • Develop production tooling and services to improve our platform's resilience.

Minimum Qualification:

  • Bachelor's degree (B. E./B. Tech.) in Computer Science, or a related technical field, or equivalent practical experience.
  • 2+ years of experience in an SRE or DevOps role, preferably within the e-commerce sector.
  • 2+ years of experience in programming languages such as Go, Python, or JavaScript, coupled with a solid understanding of data structures and algorithms.
  • Experience with containerisation technologies such as Docker and Kubernetes.
  • Experience with cloud platforms like AWS, GCP, or Azure.
  • Experience with monitoring and alerting tools such as Grafana, Prometheus, Sentry, PagerDuty, New Relic, AWS CloudWatch, etc.
  • Proficiency in Unix/Linux shell environments.

Some specific Requirements:

  • 3+ years of experience in an SRE or DevOps role, preferably within the e-commerce sector.
  • 3+ years of experience managing production infrastructure. Prior experience leading or managing a team is a strong advantage.
  • Experience with message queues like Kafka or RabbitMQ and a strong understanding of event-driven architectures.
  • Experience with any orchestration and deployment tools such as Terraform, Pulumi, AWS CloudFormation, etc.
  • Hands-on experience with any configuration management systems like Ansible, Chef, Puppet, SaltStack, etc.
  • Understanding of load testing methodologies and tools such as Grafana k6, Gatling, Locust, Apache JMeter, etc.

What do we offer?

Growth

Growth knows no bounds, as we foster an environment that encourages creativity, embraces challenges, and cultivates a culture of continuous expansion. We are looking at new product lines, international markets and brilliant people to grow even further. We teach, groom and nurture our people to become leaders. You get to grow with a company that is growing exponentially.

Flex University: We help you upskill by organising in-house courses on important subjects

Learning Wallet: You can also do an external course to upskill and grow, we reimburse it for you.

Culture

Community and Team building activities

Host weekly, quarterly and annual events/parties.

Wellness

Mediclaim policy for you + parents + spouse + kids

Experienced therapist for better mental health, improve productivity & work-life balance

We work from the office 5 days a week to promote collaboration and teamwork. Join us to make an impact in an engaging, in-person environment



  • Mumbai, Maharashtra, India Oracle Financial Services Software Ltd Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Senior Site Reliability Developer OCI is Oracle's next-generation cloud platform, built for the most demanding enterprise workloads. We deliver high-performance computing, storage, networking, and platform services at global scale. The AI Platform, Services & Solutions organization within OCI is building the foundation for enterprise AI—spanning GPU...


  • Mumbai, Maharashtra, India Talent Leads HR Solutions Pvt Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Skill, Knowledge &Trainings : - Site Reliability Engineer will be responsible to develop and implement services that improve Software development Life Cycle. - Build automations which will help optimize software delivery. - Improve reliability, quality, and time-to-market of our suite of software solutions. - Will be responsible for availability,...


  • Mumbai, Maharashtra, India Aanseacore Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    We are seeking experienced Site Reliability Engineers (SREs) and CDN Specialists with deep expertise in global performance optimization, cloud infrastructure reliability, and edge computing. The ideal candidate will have a strong technical foundation in network performance engineering, Azure cloud operations, and CDN/edge delivery systems, ensuring...


  • Mumbai, Maharashtra, India Avant-Garde Corporate Services Private Limited Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    We are seeking a skilled and proactive Site Reliability Engineer (SRE) to join the IT Transformation team.The role involves driving automation, reliability, and performance optimization across mission-critical applications and infrastructure within a financial market ecosystem.The successful candidate will manage end-to-end deployment automation, CI/CD...


  • Mumbai, Maharashtra, India JPMorganChase Full time US$ 1,20,000 - US$ 2,00,000 per year

    DescriptionGuide and shape the future of technology at a globally recognized firm, driven by pride in ownership.As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the Finance technology team which is aligned to Corporate Technology Division, you are the non-functional requirement owner and champion for the applications in your...


  • Mumbai, Maharashtra, India Search Synergy Pvt Ltd Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Note - Location - Dadar/Kurla (Mumbai)Skill, Knowledge &Trainings : - Own and manage the CI/CD pipelines for automated build, test, and deployment. - Design and implement robust deployment strategies for microservices and web applications. - Set up and maintain monitoring, alerting, and logging frameworks (e.g., Prometheus, Grafana, ELK) - Build...


  • Mumbai, Maharashtra, India ETP Group Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Experience Required7-10LocationMumbaiRole TypeFull timeJob Title: Senior Site Reliability Engineer (SRE) – MACH SaaS PlatformKey ResponsibilitiesEnsure uptime SLAs and overall reliability of production, staging, and test environments.Continuously assess all platform components for correct configuration — including instance sizes, memory allocation,...


  • Mumbai, Maharashtra, India APTO SOLUTIONS - EXECUTIVE SEARCH & CONSULTANTS Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    #Hiring Alert – Site Reliability Engineer L2 (SRE) Location: Mumbai - contractualExperience - 5+ YearsNotice - Immediate Joiners Apply Now: Skills & Experience:5+ years of proven tech experience.Hands-on in Data Center Operations (DCOps) – Linux installation, configuration & troubleshooting.Strong experience in Java, container technologies...


  • Navi Mumbai, Maharashtra, India Acura Solution Full time ₹ 4,00,00,000 - ₹ 8,00,00,000 per year

    Job Description: Designation: Site Reliability ArchitectLocation: Turbhe Office, MumbaiCTC: as per company normsThe Site Reliability Architect is a key leadership role, responsible for designing and implementing the architectural vision for our production systems, with a primary focus on reliability, scalability, and performance. This individual will work...


  • Mumbai, Maharashtra, India JPMorganChase Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    JOB DESCRIPTIONPlay a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions.As a Site Reliability Engineer II at JPMorgan Chase within the Client Onboarding team which is aligned to Corporate Technology division, you will use technology to solve business problems and leverage software engineering best...