Senior site reliability engineer

9 hours ago


India Sapaad Full time

WHO WE ARE Sapaad is a global leader in unified commerce platforms, delivering world-class software solutions for the food and beverage industry. Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering thousands of F& B businesses across 40+ countries —with many more coming onboard each day. Driven by a passionate team of developers, designers, and product experts, Sapaad is constantly evolving—introducing innovative, industry-defining features that set the benchmark for F& B tech. Headquartered in Singapore, with offices across five countries, Sapaad is backed by seasoned technology veterans with deep expertise in web, mobility, and e-commerce. JOB OVERVIEW Sapaad Software Private Limited is seeking a Senior Site Reliability Engineer (SRE) to lead our infrastructure reliability efforts and mentor a growing SRE team. This is a strategic, hands-on leadership position responsible for ensuring the reliability, scalability, and performance of our global cloud-based restaurant management platform serving thousands of customers worldwide. As a senior member of our engineering organization, you will take ownership of system availability, drive automation initiatives, and establish SRE best practices across the company. You’ll work at the intersection of development and operations—embedding reliability into every layer of our technology stack while building and leading a team focused on operational excellence. This role is ideal for an experienced SRE professional who is passionate about building resilient systems at scale, mentoring engineering talent, and shaping the reliability culture of a fast-growing Saa S organization. WHAT YOU’LL DO Own the reliability, availability, and performance of all production systems supporting our multi-tenant Saa S platform. Define and manage SLIs, SLOs, and error budgets across critical services. Architect and implement highly available, fault-tolerant systems on AWS and Heroku. Proactively monitor and analyze performance to predict capacity needs and prevent issues. Lead incident management and postmortem processes , driving root cause analysis and preventive actions. Develop incident response playbooks , implement chaos engineering , and reduce MTTD and MTTR. Design and implement comprehensive observability solutions —monitoring, logging, and alerting for microservices and distributed systems. Enforce security and compliance standards , including access controls, vulnerability management, and patching. Mentor and lead SRE and infrastructure engineers, driving team growth, knowledge sharing, and operational maturity. Collaborate with development, Dev Ops, and product teams to embed reliability practices into every stage of the software lifecycle. YOU’RE A STRONG FIT IF YOU HAVE 5–8 years of experience in SRE, Dev Ops, or Systems Engineering roles within Saa S or cloud-based environments. 2+ years in a technical leadership or mentoring capacity . Proven experience maintaining large-scale, high-availability systems (99.9%+ uptime). Expertise with AWS (EC2, RDS, S3, ECS/EKS, Lambda) and Heroku . Proficiency in Infrastructure as Code (Terraform, Cloud Formation) and containerization (Docker, Kubernetes). Strong scripting and automation skills in Python, Bash, or Power Shell . Experience with CI/CD pipelines (Jenkins, Git Lab CI, Git Hub Actions) and configuration management tools (Chef, Ansible, Puppet). Deep understanding of SRE principles —SLIs, SLOs, toil reduction, blameless postmortems, and incident management frameworks. Hands-on experience with monitoring tools (Prometheus, Grafana, Datadog, New Relic, Cloud Watch, ELK). Excellent leadership, analytical, and communication skills with a customer-first mindset . PREFERRED QUALIFICATIONS AWS Certified Solutions Architect – Associate or Professional certification. Experience with SOC 2, ISO 27001, GDPR, or PCI DSS compliance frameworks. Background in microservices architectures , disaster recovery planning , or cost optimization . Experience in the restaurant, hospitality, or retail technology sectors.



  • India Akamai Technologies Full time

    Job Description Job Description Do you have the passion to architect and lead the next generation of public cloud infrastructure Would you like to lead modernization initiatives while building a public cloud platform from scratch Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power...


  • India Akamai Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Would you enjoy improving stability and safety of one of the largest global networks?Would you enjoy hands-on network operations work on a global scale to improve our operational efficiency?Join the Platform Cloud Services Engineering TeamThe Platform Cloud Services SRE team supports globally distributed hosting and database systems for Akamai. These systems...


  • Pune, India Barclays Full time

    Job Description Step into the role of Senior Site Reliability Engineer. At Barclays, we are more than a bank we are a force for progress. You will be the part of the central SRE (Site Reliability Engineer) core team within our wider Infrastructure team. You will act as a centre of excellence providing hands on consultancy to our different infrastructure...


  • india Synechron Full time

    We have immediate opportunity forSRE (Senior Site Reliability Engineer) 5 to 9 years. Synechron –BangaloreJob Role: -SRE (Senior Site Reliability Engineer) Job Location: -Bangalore Notice Period:Within 30daysAbout Synechron We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+...


  • India Sapaad Full time

    WHO WE ARE Sapaad is a global leader in unified commerce platforms, delivering world-class software solutions for the food and beverage industry. Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering thousands of F&B businesses across 40+ countries —with many more coming onboard each day. Driven by a...


  • India Akamai Full time ₹ 8,00,000 - ₹ 25,00,000 per year

    Do you have the passion to architect and lead the next generation of public cloud infrastructure?Would you like to lead modernization initiatives while building a public cloud platform from scratch?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform....


  • India Akamai Full time

    Do you have the passion to architect and lead the next generation of public cloud infrastructure? Would you like to lead modernization initiatives while building a public cloud platform from scratch? Join our IaaS Site Reliability Engineering (SRE) team. We design, develop, and operate infrastructure and services that power the backbone of our cloud...


  • Chennai, Tamil Nadu, India, Tamil Nadu Tata Consultancy Services Full time

    Dear Candidates,Greetings from TCS!!!TCS is looking for Senior Site Reliability Engineer – AWSExperience: 8-12 yearsLocation: ChennaiMust have skills: Design, implement, and maintain scalable, secure, and highly available infrastructure on AWSDevelop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, HarnessOwn and implement...


  • India Akamai Technologies Full time

    Job Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed systems problems Join the Mapping SRE team The Mapping SRE team manages availability, reliability, performance, and change processes for Akamai's mapping system. This system routes trillions of daily client...


  • india, IN iVedha Inc. Full time

    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering PracticeLocation: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone.Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...