
Site Reliability Engineer
7 days ago
This role is for one of Weekday's clients
Location: Chennai
JobType: full-time
What will you do?
- We're looking for a self-motivated, enthusiastic, and hands-on engineer to set up solid DevOps and SRE foundations. If you thrive in a small, high-energy team and want to play a key role in shaping infrastructure and reliability at scale, this is the place for you.
- We're looking for a hands-on engineer with 3–6 years of experience who has a solid grasp of cloud infrastructure, a strong foundation in Infrastructure as Code (IaC), and a keen eye for choosing the right tools for the job. You'll help design, build, and scale resilient infrastructure for a fast-growing, product-driven team.
- Design, build, and manage cloud infrastructure using Infrastructure as Code (IaC) tools like Terraform, Ansible, Chef, or CloudFormation.
- Champion observability by defining SLIs, SLOs, and building robust monitoring, logging, and alerting systems using tools like Prometheus, Grafana, and custom telemetry.
- Ensure availability, scalability, and resilience of our SaaS platform and platform services in production.
- Proven ability to improve system observability through the design and instrumentation of system-level metrics, enhancing visibility into system health, performance, and bottlenecks.
- Dive deep into complex system architectures to solve critical performance and reliability challenges.
- Work with developers and product teams to embed NFR (Non-functional Requirements) into every product and feature release.
- Conduct root cause analysis and system-level debugging (primarily on Linux).
- Build and maintain CI/CD pipelines, automating deployments and infrastructure operations across environments.
- Scale infrastructure to meet growth needs while optimizing cost and performance.
- Take ownership of incident response, on-call rotations, and blameless postmortems.
- Collaborate cross-functionally to drive technical and architectural decision
- Highly self-driven, accountable, and eager to own initiatives end-to-end. Comfortable working in startups or small teams, where flexibility, speed, and autonomy are key. Strong communication and cross-team collaboration skills.
You should apply if
- Proficient in at least one programming language — Python, Java, or similar.
- Demonstrated experience with performance optimization, latency reduction, and scaling services.
- Strong analytical skills for incident debugging, log analysis, and system troubleshooting.
- Understanding of service-level metrics (SLIs, SLOs, error budgets) and how to operationalize them.
- Experience building large-scale, distributed, resilient systems.
- Strong understanding of core infrastructure components such as load balancers, firewalls, and databases — including their internal workings and operational fundamentals.
- Solid understanding of infrastructure cost management — proactively identifies cost drivers, implements optimization strategies, and contributes to cost reduction initiatives without compromising reliability or performance.
- Familiarity with on-call responsibilities, incident management, and root cause analysis.
- Strong experience with Infrastructure as Code (IaC): Terraform, Ansible, Chef, or CloudFormation and other orchestration tools
- Ability to deep-dive into third-party or internal library codebases to understand internal behavior, debug complex issues, and contribute insights or fixes when needed.
- Solid understanding of cloud platforms — preferably AWS, but Azure or GCP is also acceptable.
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE)Experience: 4 – 10 YearsLocation: Chennai (Hybrid – 2 days in office)Role Overview:We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.Key Responsibilities- Design,...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE) Experience: 4 – 10 Years Location: Chennai (Hybrid – 2 days in office) Role Overview: We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services. Key Responsibilities ...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timeWork Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and operating highly reliable and scalable products....
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India Concord Full timeSRE Sr. Engineers (Individual Contributors)Key Attributes:Strong SRE (Site Reliability Engineering) experienceDevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc.Excellent troubleshooting and debugging skills (infrastructure + application level)Perseverance – must push through complex/challenging issues without giving upAble to...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timeJob DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timeJob Description Exp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office) We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timeJob DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Intellect Design Arena Full time ₹ 5,00,000 - ₹ 8,00,000 per yearJob Title: Site Reliability EngineerCompany: Intellect Design Arena LtdLocation: Chennai, IndiaExperience Required: 6+ yearsJob Type: Full-timeDepartment: SRE / DevOps / Engineering EnablementAbout Intellect Design Arena LtdIntellect Design Arena Ltd is a global leader in digital financial technology, offering cutting-edge solutions for banking, insurance,...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India NatWest Markets Full timeJob DescriptionJoin us as a Site Reliability Engineer- You ll be managing the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ)- We ll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of CCJ across...
-
Site Reliability Engineer Position
1 week ago
Chennai, Tamil Nadu, India beBeeEngineering Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Stability, scalability and operational excellence of Accounting and Finance platforms are critical to success.We're seeking a highly skilled engineer to play a pivotal role in ensuring that these systems operate with consistency and trustworthiness.Reliability & Availability: Ensure Accounting and Finance platforms meet defined SLAs, SLOs, and SLIs for...