Senior site reliability engineer

3 weeks ago

India Sapaad Full time

WHO WE ARE Sapaad is a global leader in unified commerce platforms, delivering world-class software solutions for the food and beverage industry. Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering thousands of F& B businesses across 40+ countries —with many more coming onboard each day. Driven by a passionate team of developers, designers, and product experts, Sapaad is constantly evolving—introducing innovative, industry-defining features that set the benchmark for F& B tech. Headquartered in Singapore, with offices across five countries, Sapaad is backed by seasoned technology veterans with deep expertise in web, mobility, and e-commerce. JOB OVERVIEW Sapaad Software Private Limited is seeking a Senior Site Reliability Engineer (SRE) to lead our infrastructure reliability efforts and mentor a growing SRE team. This is a strategic, hands-on leadership position responsible for ensuring the reliability, scalability, and performance of our global cloud-based restaurant management platform serving thousands of customers worldwide. As a senior member of our engineering organization, you will take ownership of system availability, drive automation initiatives, and establish SRE best practices across the company. You’ll work at the intersection of development and operations—embedding reliability into every layer of our technology stack while building and leading a team focused on operational excellence. This role is ideal for an experienced SRE professional who is passionate about building resilient systems at scale, mentoring engineering talent, and shaping the reliability culture of a fast-growing Saa S organization. WHAT YOU’LL DO Own the reliability, availability, and performance of all production systems supporting our multi-tenant Saa S platform. Define and manage SLIs, SLOs, and error budgets across critical services. Architect and implement highly available, fault-tolerant systems on AWS and Heroku. Proactively monitor and analyze performance to predict capacity needs and prevent issues. Lead incident management and postmortem processes , driving root cause analysis and preventive actions. Develop incident response playbooks , implement chaos engineering , and reduce MTTD and MTTR. Design and implement comprehensive observability solutions —monitoring, logging, and alerting for microservices and distributed systems. Enforce security and compliance standards , including access controls, vulnerability management, and patching. Mentor and lead SRE and infrastructure engineers, driving team growth, knowledge sharing, and operational maturity. Collaborate with development, Dev Ops, and product teams to embed reliability practices into every stage of the software lifecycle. YOU’RE A STRONG FIT IF YOU HAVE 5–8 years of experience in SRE, Dev Ops, or Systems Engineering roles within Saa S or cloud-based environments. 2+ years in a technical leadership or mentoring capacity . Proven experience maintaining large-scale, high-availability systems (99.9%+ uptime). Expertise with AWS (EC2, RDS, S3, ECS/EKS, Lambda) and Heroku . Proficiency in Infrastructure as Code (Terraform, Cloud Formation) and containerization (Docker, Kubernetes). Strong scripting and automation skills in Python, Bash, or Power Shell . Experience with CI/CD pipelines (Jenkins, Git Lab CI, Git Hub Actions) and configuration management tools (Chef, Ansible, Puppet). Deep understanding of SRE principles —SLIs, SLOs, toil reduction, blameless postmortems, and incident management frameworks. Hands-on experience with monitoring tools (Prometheus, Grafana, Datadog, New Relic, Cloud Watch, ELK). Excellent leadership, analytical, and communication skills with a customer-first mindset . PREFERRED QUALIFICATIONS AWS Certified Solutions Architect – Associate or Professional certification. Experience with SOC 2, ISO 27001, GDPR, or PCI DSS compliance frameworks. Background in microservices architectures , disaster recovery planning , or cost optimization . Experience in the restaurant, hospitality, or retail technology sectors.

Site Reliability Engineer

2 weeks ago

, India, IN Sonata Software Full time

We're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...
Senior/expert site

2 weeks ago

India IVedha Inc. Full time

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice Location: India (Remote) -Must be available to work in the EST (US/Canada) Time Zone. Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...
Senior Site Reliability Engineer

3 weeks ago

Pune, India Barclays Full time

Job Description Step into the role of Senior Site Reliability Engineer. At Barclays, we are more than a bank we are a force for progress. You will be the part of the central SRE (Site Reliability Engineer) core team within our wider Infrastructure team. You will act as a centre of excellence providing hands on consultancy to our different infrastructure...
Senior Site Reliability Engineer

7 days ago

India Akamai Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Would you enjoy improving stability and safety of one of the largest global networks?Would you enjoy hands-on network operations work on a global scale to improve our operational efficiency?Join the Platform Cloud Services Engineering TeamThe Platform Cloud Services SRE team supports globally distributed hosting and database systems for Akamai. These systems...
Senior Site Reliability Engineer

3 weeks ago

India Sapaad Full time

WHO WE ARE Sapaad is a global leader in unified commerce platforms, delivering world-class software solutions for the food and beverage industry. Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering thousands of F&B businesses across 40+ countries —with many more coming onboard each day. Driven by a...
Senior II Site Reliability Engineer

1 week ago

India Akamai Full time ₹ 8,00,000 - ₹ 25,00,000 per year

Do you have the passion to architect and lead the next generation of public cloud infrastructure?Would you like to lead modernization initiatives while building a public cloud platform from scratch?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform....
Senior Site Reliability Engineer

3 weeks ago

Chennai, Tamil Nadu, India, Tamil Nadu Tata Consultancy Services Full time

Dear Candidates,Greetings from TCS!!!TCS is looking for Senior Site Reliability Engineer – AWSExperience: 8-12 yearsLocation: ChennaiMust have skills: Design, implement, and maintain scalable, secure, and highly available infrastructure on AWSDevelop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, HarnessOwn and implement...
Senior Site Reliability Engineer

4 weeks ago

India Akamai Technologies Full time

Job Description Job Description Do you like collaborating across teams to solve complex problems Do you enjoy solving large scale distributed systems problems Join the Mapping SRE team The Mapping SRE team manages availability, reliability, performance, and change processes for Akamai's mapping system. This system routes trillions of daily client...
Site Reliability Engineer

2 weeks ago

India Akamai Full time ₹ 8,00,000 - ₹ 24,00,000 per year

Do you like collaborating across teams to solve complex problems?Do you enjoy solving large scale distributed content delivery challenges?Join our highly skilled Compute Site Reliability teamOur team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We specialize in creating solutions that...
Site Reliability Engineer

2 weeks ago

Bengaluru, Karnataka, India, Karnataka WhiteLotus Talent Partners Full time

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes. In this role, you will focus on monitoring, basic troubleshooting, and incident response, helping to maintain high system...

Americas

Europe

Asia / Oceania

Africa

Senior site reliability engineer