Site Reliability Engineer

1 week ago


Chennai, Tamil Nadu, India Weekday AI Full time ₹ 7,00,000 - ₹ 12,00,000 per year

This role is for one of Weekday's clients
Location: Chennai
JobType: full-time

Requirements

What will you do?

  • We're looking for a self-motivated, enthusiastic, and hands-on engineer to set up solid DevOps and SRE foundations. If you thrive in a small, high-energy team and want to play a key role in shaping infrastructure and reliability at scale, this is the place for you.
  • We're looking for a hands-on engineer with 3–6 years of experience who has a solid grasp of cloud infrastructure, a strong foundation in Infrastructure as Code (IaC), and a keen eye for choosing the right tools for the job. You'll help design, build, and scale resilient infrastructure for a fast-growing, product-driven team.
  • Design, build, and manage cloud infrastructure using Infrastructure as Code (IaC) tools like Terraform, Ansible, Chef, or CloudFormation.
  • Champion observability by defining SLIs, SLOs, and building robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, and custom telemetry.
  • Ensure availability, scalability, and resilience of our SaaS platform and platform services in production.
  • Proven ability to improve system observability through the design and instrumentation of system-level metrics, enhancing visibility into system health, performance, and bottlenecks.
  • Dive deep into complex system architectures to solve critical performance and reliability challenges.
  • Work with developers and product teams to embed NFR (Non-functional Requirements) into every product and feature release.
  • Conduct root cause analysis and system-level debugging (primarily on Linux).
  • Build and maintain CI/CD pipelines, automating deployments and infrastructure operations across environments.
  • Scale infrastructure to meet growth needs while optimizing cost and performance.
  • Take ownership of incident response, on-call rotations, and blameless postmortems.
  • Collaborate cross-functionally to drive technical and architectural decision
  • Highly self-driven, accountable, and eager to own initiatives end-to-end. Comfortable working in startups or small teams, where flexibility, speed, and autonomy are key. Strong communication and cross-team collaboration skills.

You should apply if

  • Proficient in at least one programming language — Python, Java, or similar.
  • Demonstrated experience with performance optimization, latency reduction, and scaling services.
  • Strong analytical skills for incident debugging, log analysis, and system troubleshooting.
  • Understanding of service-level metrics (SLIs, SLOs, error budgets) and how to operationalize them.
  • Experience building large-scale, distributed, resilient systems.
  • Strong understanding of core infrastructure components such as load balancers, firewalls, and databases — including their internal workings and operational fundamentals.
  • Solid understanding of infrastructure cost management — proactively identifies cost drivers, implements optimization strategies, and contributes to cost reduction initiatives without compromising reliability or performance.
  • Familiarity with on-call responsibilities, incident management, and root cause analysis.
  • Strong experience with Infrastructure as Code (IaC): Terraform, Ansible, Chef, or CloudFormation and other orchestration tools
  • Ability to deep-dive into third-party or internal library codebases to understand internal behavior, debug complex issues, and contribute insights or fixes when needed.
  • Solid understanding of cloud platforms — preferably AWS, but Azure or GCP is also acceptable.


  • Chennai, Tamil Nadu, India Zyoin Group Full time

    Position: Site Reliability Engineer (SRE)Experience: 4 – 10 YearsLocation: Chennai (Hybrid – 2 days in office)Role Overview:We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.Key Responsibilities- Design,...


  • Chennai, Tamil Nadu, India Zyoin Group Full time

    Position: Site Reliability Engineer (SRE) Experience: 4 – 10 Years Location: Chennai (Hybrid – 2 days in office) Role Overview: We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services. Key Responsibilities ...


  • Chennai, Tamil Nadu, India Zyoin Group Full time

    Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and operating highly reliable and scalable products....


  • Chennai, Tamil Nadu, India Concord Full time

    SRE Sr. Engineers (Individual Contributors)Key Attributes:Strong SRE (Site Reliability Engineering) experienceDevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc.Excellent troubleshooting and debugging skills (infrastructure + application level)Perseverance – must push through complex/challenging issues without giving upAble to...


  • Chennai, Tamil Nadu, India Zyoin Group Full time

    Job Description Exp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office) We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building...


  • Chennai, Tamil Nadu, India Zyoin Group Full time

    Job DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and...


  • Chennai, Tamil Nadu, India Zyoin Group Full time

    Job DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and...


  • Chennai, Tamil Nadu, India Intellect Design Arena Full time ₹ 5,00,000 - ₹ 8,00,000 per year

    Job Title: Site Reliability EngineerCompany: Intellect Design Arena LtdLocation: Chennai, IndiaExperience Required: 6+ yearsJob Type: Full-timeDepartment: SRE / DevOps / Engineering EnablementAbout Intellect Design Arena LtdIntellect Design Arena Ltd is a global leader in digital financial technology, offering cutting-edge solutions for banking, insurance,...


  • Chennai, Tamil Nadu, India NatWest Markets Full time

    Job DescriptionJoin us as a Site Reliability Engineer- You ll be managing the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ)- We ll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of CCJ across...


  • Chennai, Tamil Nadu, India beBeeEngineering Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Stability, scalability and operational excellence of Accounting and Finance platforms are critical to success.We're seeking a highly skilled engineer to play a pivotal role in ensuring that these systems operate with consistency and trustworthiness.Reliability & Availability: Ensure Accounting and Finance platforms meet defined SLAs, SLOs, and SLIs for...