
Technical Lead/Site Reliability Engineer
2 weeks ago
Description :
Why We Need You The Mission & Our Vision :
Veryon is a leading software and technology company that enables aviation teams around the world to improve efficiency and safety.
Our products maximize uptime for aircraft maintenance teams through customer-driven innovation and world-class service.
With over 7,500 customers across 137 countries, we serve general and business aviation, military/defense, commercial aviation, and OEMs.
Our valuesFueled by Customers, Win Together, Make It Happen, Innovate to Elevateare the foundation of everything we do.
As a hands-on Technical Lead in Site Reliability Engineering, you will be directly responsible for designing, building, and implementing modern reliability practices to ensure uptime, resilience, and production excellence across Veryons systems.
Youll work closely with Engineering, DevOps, and Support teams to streamline software delivery to both internal and client environments, troubleshoot production issues, and build observability using Datadog, Dynatrace, and AWS-native tools.
You will also be a mentor on best practices and a key contributor to reliability-focused architecture and deployment design.
What Youll Accomplish Your Performance Objectives :
Objective #1 First 30 Days :
- Complete onboarding and gain deep understanding of Veryons systems, release processes, and deployment environment on AWS.
- Review existing application architecture, CI/CD flows, and monitoring implementations.
- Begin implementing improvements to observability using Datadog and Dynatrace.
- Collaborate with engineers and DevOps to identify bottlenecks in production releases and issue resolution.
Objective #2 First 90 Days :
- Build or enhance monitoring dashboards and alerts for critical infrastructure and applications.
- Define and begin implementing Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
- Own and improve release workflows and ensure reliable software delivery to customer environments.
- Take ownership of investigating production issues, ensuring timely resolution and coordination across teams.
- Begin documenting Root Cause Analyses (RCAs) for production incidents and drive preventive improvements.
- Partner with DevOps to optimize and automate CI/CD pipelines using GitLab or equivalent.
Objective #3 First 12 Months :
- Deliver measurable improvements in system uptime, MTTR, and deployment success rate.
- Build self-healing automation and rollback mechanisms for high-risk services.
- Standardize and own the RCA process for production incidents to ensure continuous learning.
- Implement robust controls and metrics to monitor software delivery health.
- Support production readiness of new services through performance baselining and fault testing.
- Establish and track health KPIs that inform operational decisions and product improvements.
Requirements :
Key Job Responsibilities :
- Implement and manage observability, alerting, and dashboards using Datadog, Dynatrace, and AWS tools.
- Take ownership of production deployments, ensuring successful delivery to client environments with minimal disruption.
- Troubleshoot and resolve production issues across the stack (infrastructure, application, integration).
- Lead Root Cause Analysis (RCA) documentation, follow-ups, and remediation planning.
- Define and maintain service SLOs, SLIs, and error budgets with product and engineering teams.
- Build automation for deployment, monitoring, incident response, and recovery.
- Design CI/CD workflows that support safe and reliable delivery across distributed environments.
- Partner with developers to ensure observability and reliability are part of the application design.
- Mentor engineers in SRE principles, monitoring strategy, and scalable operations.
Experience And Skills We Seek :
- 6+ years of experience in SRE, DevOps, or platform engineering roles.
- Strong hands-on experience with AWS services (e.g., EC2, ECS/EKS, RDS, IAM, CloudWatch, Route 53, ELB, etc.) is required.
- Deep familiarity with CI/CD pipelines and deployment strategies using GitLab CI, Jenkins, or equivalent.
- Expertise in observability tools such as Datadog and Dynatrace for APM, logging, and alerting.
- Solid experience troubleshooting distributed systems in production environments.
- Proficiency in scripting and infrastructure as code (e.g., Python, Bash, Terraform, Ansible).
- Working knowledge of containers and orchestration (Docker, Kubernetes).
- Understanding of SRE principles (SLIs, SLOs, MTTR, incident response, etc.)
- Excellent communication and documentation skills, especially for RCA and runbook creation.
- Bachelors or Masters degree in Computer Science, Engineering, or a related field.
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE)Experience: 4 – 10 YearsLocation: Chennai (Hybrid – 2 days in office)Role Overview:We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.Key Responsibilities- Design,...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timePosition: Site Reliability Engineer (SRE) Experience: 4 – 10 Years Location: Chennai (Hybrid – 2 days in office) Role Overview: We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services. Key Responsibilities ...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timeJob Description Exp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office) We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timeJob DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and...
-
Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Zyoin Group Full timeJob DescriptionExp : 4- 10 Years Location : Chennai Work Mode: Hybrid (2 days Office)We are looking for a Site Reliability Engineer (SREs) who will lead the Site Reliability Engineering(SRE) side of each of our products. This position is responsible for making technical decisions, collaborating with development teams and platform engineers, and building and...
-
Site Reliability Engineer
3 hours ago
Chennai, Tamil Nadu, India Elgebra Full time ₹ 6,00,000 - ₹ 18,00,000 per yearHiring: Site Reliability Engineer – 7+ YearsLocation: Bangalore / Chennai Payroll: Elgebra Client: Qincline Joining: Immediate to 15 DaysRole Overview:We are looking for an experienced Site Reliability Engineer (SRE) with over 6 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and the...
-
Reliability Engineering Lead
2 weeks ago
Chennai, Tamil Nadu, India beBeeTechnical Full time US$ 8,00,000 - US$ 12,00,000**Job Summary**The Technical Manager for Site Reliability Engineering oversees daily operations, technical mentorship, and strategic alignment with business goals.**Key Responsibilities:**Lead a remote team ensuring operational excellence and fostering a high-performing culture.Report to the Director of Systems and Security.Oversee day-to-day operations,...
-
Lead Site Reliability Engineer
2 weeks ago
Chennai, Tamil Nadu, India Trimble Full timeJob DescriptionLead Site Reliability EngineerReporting to:Sr Manager, Availability ManagementOffice Location:Chennai, IndiaFlexible Working:Hybrid (Part Office/Part Home)Cloud Site Reliability Engineer Responsibilities- On-board internal customers to our 24x7 Applications Support and Enterprise Status Page services- Be involved with creating an SRE culture...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India ViaSat Full timeAbout us One team Global challenges Infinite opportunities At Viasat were on a mission to deliver connections with the capacity to change the world For more than 35 years Viasat has helped shape how consumers businesses governments and militaries around the globe communicate Were looking for people who think big act fearlessly and create an...
-
Site Reliability Engineer
1 week ago
Chennai, Tamil Nadu, India Concord Full timeSRE Sr. Engineers (Individual Contributors)Key Attributes:Strong SRE (Site Reliability Engineering) experienceDevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc.Excellent troubleshooting and debugging skills (infrastructure + application level)Perseverance – must push through complex/challenging issues without giving upAble to...