Site Reliability Engineer – Technical Lead

2 days ago


Chennai, Tamil Nadu, India Veryon Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Description
Why We Need You – The Mission & Our Vision
Veryon is a leading software and technology company that enables aviation teams around the world to improve efficiency and safety. Our products maximize uptime for aircraft maintenance teams through customer-driven innovation and world-class service.

With over 7,500 customers across 137 countries, we serve general and business aviation, military/defense, commercial aviation, and OEMs. Our values—Fueled by Customers, Win Together, Make It Happen, Innovate to Elevate—are the foundation of everything we do.

As a hands-on Technical Lead in Site Reliability Engineering, you will be directly responsible for designing, building, and implementing modern reliability practices to ensure uptime, resilience, and production excellence across Veryon's systems. You'll work closely with Engineering, DevOps, and Support teams to streamline software delivery to both internal and client environments, troubleshoot production issues, and build observability using Datadog, Dynatrace, and AWS-native tools. You will also be a mentor on best practices and a key contributor to reliability-focused architecture and deployment design.

What You'll Accomplish – Your Performance Objectives

Objective #1 – First 30 Days

  • Complete onboarding and gain deep understanding of Veryon's systems, release processes, and deployment environment on AWS.
  • Review existing application architecture, CI/CD flows, and monitoring implementations.
  • Begin implementing improvements to observability using Datadog and Dynatrace.
  • Collaborate with engineers and DevOps to identify bottlenecks in production releases and issue resolution.

Objective #2 – First 90 Days

  • Build or enhance monitoring dashboards and alerts for critical infrastructure and applications.
  • Define and begin implementing Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
  • Own and improve release workflows and ensure reliable software delivery to customer environments.
  • Take ownership of investigating production issues, ensuring timely resolution and coordination across teams.
  • Begin documenting Root Cause Analyses (RCAs) for production incidents and drive preventive improvements.
  • Partner with DevOps to optimize and automate CI/CD pipelines using GitLab or equivalent.

Objective #3 – First 12 Months

  • Deliver measurable improvements in system uptime, MTTR, and deployment success rate.
  • Build self-healing automation and rollback mechanisms for high-risk services.
  • Standardize and own the RCA process for production incidents to ensure continuous learning.
  • Implement robust controls and metrics to monitor software delivery health.
  • Support production readiness of new services through performance baselining and fault testing.
  • Establish and track health KPIs that inform operational decisions and product improvements.

Requirements
Key Job Responsibilities

  • Implement and manage observability, alerting, and dashboards using Datadog, Dynatrace, and AWS tools.
  • Take ownership of production deployments, ensuring successful delivery to client environments with minimal disruption.
  • Troubleshoot and resolve production issues across the stack (infrastructure, application, integration).
  • Lead Root Cause Analysis (RCA) documentation, follow-ups, and remediation planning.
  • Define and maintain service SLOs, SLIs, and error budgets with product and engineering teams.
  • Build automation for deployment, monitoring, incident response, and recovery.
  • Design CI/CD workflows that support safe and reliable delivery across distributed environments.
  • Partner with developers to ensure observability and reliability are part of the application design.
  • Mentor engineers in SRE principles, monitoring strategy, and scalable operations.

Experience And Skills We Seek

  • 6+ years of experience in SRE, DevOps, or platform engineering roles.
  • Strong hands-on experience with AWS services (e.g., EC2, ECS/EKS, RDS, IAM, CloudWatch, Route 53, ELB, etc.) is required.
  • Deep familiarity with CI/CD pipelines and deployment strategies using GitLab CI, Jenkins, or equivalent.
  • Expertise in observability tools such as Datadog and Dynatrace for APM, logging, and alerting.
  • Solid experience troubleshooting distributed systems in production environments.
  • Proficiency in scripting and infrastructure as code (e.g., Python, Bash, Terraform, Ansible).
  • Working knowledge of containers and orchestration (Docker, Kubernetes).
  • Understanding of SRE principles (SLIs, SLOs, MTTR, incident response, etc.).
  • Excellent communication and documentation skills, especially for RCA and runbook creation.
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

How We Work – The Core Values That We Live By
Fueled By Customers
– Everything we do is to help our customers increase uptime. Transparent communication and customer empathy drive our decisions.

Win Together
– Collaboration across teams is our core strength. We believe every person is vital to our success.

Make It Happen
– We take initiative, follow through, and adapt as needed. We take ownership and tackle tough challenges.

Innovate to Elevate
– We embrace change, experiment boldly, and continuously improve. We lead by setting a high bar for ourselves and our industry.



  • Chennai, Tamil Nadu, India Ford Global Career Site Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Be at the Forefront of Mobility's Future: Join Ford as a Site Reliability EngineerEnterprise Technology is the engine driving the future of transportation, and we're looking for a talented Site Reliability Engineer (SRE) to help us redefine mobility. In this role, you'll leverage cutting-edge technology to enhance customer experiences, improve lives, and...


  • Chennai, Tamil Nadu, India Elgebra Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Hiring: Site Reliability Engineer – 7+ YearsLocation: Bangalore / Chennai Payroll: Elgebra Client: Qincline Joining: Immediate to 15 DaysRole Overview:We are looking for an experienced Site Reliability Engineer (SRE) with over 6 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and the...


  • Chennai, Tamil Nadu, India workday technical consultant Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Job Description Your work days are brighter here. Were obsessed with making hard work pay off, for our people, our customers, and the world around us. As a Fortune 500 company and a leading AI platform for managing people, money, and agents, were shaping the future of work so teams can reach their potential and focus on what matters most. The minute...


  • Chennai, Tamil Nadu, India Elgebra Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Role Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our client, Qincline. The ideal candidate will have 7 or more years of dedicated experience in Site Reliability Engineering or a closely related discipline. This pivotal role requires a strong focus on ensuring the...


  • Chennai, Tamil Nadu, India Ford Motor Full time

    SRE - Software Engineer Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology to redefine the transportation landscape, enhance the customer experience and improve people's lives, this is the opportunity for you. Join us and challenge your IT expertise and analytical...


  • Chennai, Tamil Nadu, India NatWest Group Full time

    Site Reliability Engineer, AVP Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ) We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of...


  • Chennai, Tamil Nadu, India Trimble Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Cloud Site Reliability EngineerReporting to: Sr Manager, Availability ManagementOffice Location: Chennai, IndiaFlexible Working: Hybrid (Part Office/Part Home)Cloud Site Reliability Engineer ResponsibilitiesAI in Observability: Heavily utilise migration tooling and AI to eliminate key tasks as well as optimising the collection, analysis, pre-configuration...


  • Chennai, Tamil Nadu, India NatWest Group Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYou'll enjoy significant stakeholder interaction, working in...


  • Chennai, Tamil Nadu, India ACV Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    ACV's mission is to build and enable the most trusted and efficient digital marketplaces for buying and selling used vehicles with transparency and comprehensive data that was previously unimaginable. We are powered by a combination of the world's best people and the industry's best technology.  At ACV, we are driven by an entrepreneurial spirit and...


  • Chennai, Tamil Nadu, India Talent Worx Full time ₹ 1,20,000 - ₹ 3,00,000 per year

    EXP required - 5 to 8 years.Role and ResponsibilitiesReporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business.  In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey,...