Lead Site Reliability Engineer

7 days ago

New Delhi, India SITA Full time

Overview

WELCOME TO SITA

We're the team that keeps airports moving, airlines flying smoothly, and borders open. Our tech and communication innovations are the secret behind the success of the world's air travel industry.

You'll find us at 95% of international hubs. We partner closely with over 2,500 transportation and government clients, each with their own unique needs and challenges. Our goal is to find fresh solutions and cutting-edge tech to make their operations run like clockwork. Want to be a part of something big?

Are you ready to love your job? The adventure begins right here, with you, at SITA.

PURPOSE

KEY RESPONSIBILITIES

Define build and maintain support systems to ensure high availability and performance. Work closely with Product, Engineering & Service support architects for new product productization as Operation technical expert and as well in reviewing non standard bids to check operability feasibility. Ensure Operations readiness to support new products and ensure they are trained to support effectively. SREs are responsible for making sure that the systems and services they support meet the non-functional requirements defined by the business, the users, and the organization. They are the guardians of reliability and availability, ensuring that systems perform as expected, scale appropriately, and are resilient to failures Defining and Understanding NFRs:SREs collaborate with stakeholders to understand and define NFRs, such as performance targets (response time, throughput), scalability limits, security requirements (encryption, authentication), and maintainability goals (ease of updates, error handling).
They help translate these abstract requirements into concrete, measurable metrics and Service Level Objectives (SLOs & service-level indicators (SLIs) . Designing and Implementing Reliable Systems:
SREs should design and implement systems that are resilient to failures and can meet the defined NFRs even under stress. This includes implementing fault-tolerant zero down time architectures, using techniques like redundancy, load balancing, and automated failover mechanisms.
SREs also focus on building systems that are scalable, meaning they Analyse network performance data and capacity requirements proactively to ensure the network can handle current and future demands without performance degradation. Monitoring and Incident Response:
SREs implement robust monitoring systems to track key metrics related to NFRs .
They set up alerts and notifications to proactively identify and address potential issues before they impact users. SRE must define and maintain an event catalog specifying active events thresholds , propose & implement relevant remediation and optimize it for efficiency. Develop event response protocols provide training to teams and ensure quick and efficient handling of incidents.
In the event of incidents, SREs are responsible for quickly diagnosing and resolving complex network incidents, including troubleshooting complex incidents, conducting root cause analysis, and implementing preventative measures. Perform Critical incident (e.g P1, P2 etc) root cause analysis for critical system failures to ensure high availability all the time and prevent future occurrences. Highest technical escalation contacts to handle complex cases for the Portfolio service operations as technical expert. Accountable within SGS for the in scope product to ensure high availability performance of the product/solution. Technical expert /Guru in the domain and point of contact for engineering, management operations & product. Optimizing network performance by analysing traffic patterns, identifying bottlenecks, and implementing solutions Coordinate with incident management teams, operations experts and with different application & platform Portfolio service operations and Engineering teams to develop and implement permanent solutions. Conduct thorough problem investigations via trend analysis to diagnose recurring incidents and find permanent solutions. Conduct the problem review board weekly & Monitor the effectiveness of problem resolution activities & provide regular reports on problem management activities to ensure continuous improvement. Deployment and Release Management:
SREs work closely with Engineering teams to facilitate smooth and reliable new software release deployments.
They implement strategies, clear process, SOP and rollback plan to minimize risks and reduce downtime during releases.
They ensure that new features and changes are deployed in a controlled manner, minimizing the impact on existing services. Track deployment progress , conduct operational readiness assessments on successful execution of deployment and mitigate risk or improve deployment plan to ensure service stability. DevOps/NetOps Management: Manage continuous integration and deployment (CI/CD) pipelines ensuring smooth integration between development and operational teams. Building scripts to Automate network tasks, reducing manual effort, removing toils and human error . Implement automation for system provisioning, self-healing - auto recovery, deployment , system health checks etc & monitoring event to incident with proper correlation. Implement and manage infrastructure as code provide ongoing support for automation tools and continuously improve DevOps/Netops practices. Creating & maintaining documentation related to network configurations,SOPs, and troubleshooting guides.

Qualifications

EXPERIENCE

8+ years of experience in IT operations service management or infrastructure management including roles such as Site Reliability Engineer, or NetOps Engineer /Expert.

Airline experience and/or ATI know-how, is good to have.Experise in troubleshooting Data center & Cloud setup technologies issuesExpertise in technologies like Cisco routing & switching, Cisco ACI, CISCO Nexus , Aruba, Clear Pass, Juniper Mist or any other wireless technology.Hands-on experience with CI/CD pipelines automation system, performance monitoring and the implementation of infrastructure as code.Having experience in NetOps working enviornment.Having experience in Automation and scripting.Proven experience in managing high-availability systems and ensuring operational reliability.Extensive experience in root cause analysis (RCA) incident management and developing permanent solutions for recurring service disruptions.

KNOWLEDGE & SKILLS

any one : Terraform OR Python, OR other languages is must for automation & scripting Git process knowledge good to have.CICD pipeline tools such as GitHub good to haveExperience implementing architectural standards into pipelines

Other Networking technologies:

CISCO Routing & switching must to have.CISCO ACIloadbalancersany wirless technology - Juniper Mist expertise or Aruba AP or CISCOCisco Datacenter switches like Nexus must to haveAruba Clear pass knowledge good to havePalo Alto firewalls good to haveKnowledge or experience with cloud platforms (AWS, Azure, Google Cloud) and their networking services is good to have.Familiarity with operating systems (Linux, Windows) and system-level troubleshooting is good to have.Understanding of ITIL or other incident management frameworks. Ability to effectively communicate technical information and collaborate with diverse teams.

PROFESSION COMPETENCIES

CORE COMPETENCIES

Adhering to SITA Principles & ValuesGood CommunicationCreating & InnovatingCustomer FocusImpact & InfluenceLeading ExecutionResults OrientationTeamwork

EDUCATION & QUALIFICATIONS

Bachelor's or Master degree in Computer Science Information Technology Engineering or a related field. Relevant certifications such as CCIE in data centers OR routing & switching , Expert level certification in Juniper Mist or Aruba & Palo Alto Firewall. Good to have Certifications in cloud platforms (AWS Azure Google Cloud) or DevOps methodologies (e.g. Certified DevOps Professional). ITIL certification.

WHAT WE OFFER

We're all about diversity. We operate in 200 countries and speak 60 different languages and cultures. We're really proud of our inclusive environment. Our offices are comfortable and fun places to work, and we make sure you get to work from home too. Find out what it's like to join our team and take a step closer to your best life ever.

Flex Week: Work from home up to 2 days/week (depending on your team's needs)

Flex Day: Make your workday suit your life and plans.

Flex-Location: Take up to 30 days a year to work from any location in the world.

Employee Wellbeing: We have got you covered with our Employee Assistance Program (EAP), for you and your dependents 24/7, 365 days/year. We also offer Champion Health - a personalized platform that supports a range of wellbeing needs.

Professional Development: Level up your skills with our training platforms, including LinkedIn Learning

Competitive Benefits: Competitive benefits that make sense with both your local market and employment status.

SITA is an Equal Opportunity Employer. We value a diverse workforce. In support of our Employment Equity Program, we encourage women, aboriginal people, members of visible minorities, and/or persons with disabilities to apply and self-identify in the application process.

Site Engineer

3 weeks ago

Delhi, India Engineer Department Full time

Company Description Engineer Department is a company We are dedicated to providing efficient and effective engineering solutions for public infrastructure and services. Our team is committed to ensuring the highest standards in project management and execution, serving the community with integrity and professionalism. Role Description This is a full-time...
Site Engineer

1 week ago

Delhi, Delhi, India Engineer Department Full time ₹ 6,00,000 - ₹ 12,00,000 per year

Company DescriptionEngineer Department is a company We are dedicated to providing efficient and effective engineering solutions for public infrastructure and services. Our team is committed to ensuring the highest standards in project management and execution, serving the community with integrity and professionalism.Role DescriptionThis is a full-time...
Senior Site Reliability Engineer

2 days ago

New Delhi, India Tata Consultancy Services Full time

Dear Candidates,Greetings from TCS!!!TCS is looking for Senior Site Reliability Engineer – AWSExperience: 8-12 yearsLocation: ChennaiMust have skills:- Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS - Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness - Own and implement...
Site Reliability Engineer

2 weeks ago

Delhi, India Elgebra Full time

Hiring: Site Reliability Engineer – 7+ Years Location: Bangalore / Chennai Payroll: Elgebra Client: Qincline Joining: Immediate to 15 Days Role Overview: We are looking for an experienced Site Reliability Engineer (SRE) with over 6 years of expertise to join our team. The ideal candidate will have strong technical skills, a problem-solving mindset, and...
Site Reliability Engineer

2 days ago

New Delhi, India ValueMomentum Full time

About the RoleWe are seeking an experienced Site Reliability / Azure DevOps Engineer with Dynatrace Experience to join our engineering team and contribute to scalable CI/CD practices, infrastructure automation, and cloud operations. The ideal candidate will have deep expertise in Azure DevOps, Infrastructure as Code (IaC), Azure services, and modern DevOps...
Site Reliability Engineer

2 days ago

New Delhi, India Xebia Full time

Performance & Reliability Engineer ( Senior, Lead , Principal & Manager) Hybrid Location: Pune, Chennai, Bangalore & Gurgaon Need immediate joiners onlyJob description Role: Performance & Reliability EngineerJob Location: Gurgaon, Chennai, Pune, BangaloreHybridJob Overview: We are seeking a highly skilled and motivatedPerformance & Reliability Engineerto...
Site Reliability Engineer

2 weeks ago

Delhi, India Concord Full time

SRE Sr. Engineers (Individual Contributors)Key Attributes:- Strong SRE (Site Reliability Engineering) experience- DevOps skills – CI/CD, monitoring, automation, infrastructure as code, etc.- Excellent troubleshooting and debugging skills (infrastructure + application level)- Perseverance – must push through complex/challenging issues without giving up-...
Engineer, Site Reliability

2 days ago

New Delhi, India ANSR Full time

ANSR is hiring for one of its clients.About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional...
Engineer, Site Reliability

2 days ago

New Delhi, India ANSR Full time

ANSR is hiring for one of its clients.About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional...
Site Reliability Engineer Lead

2 weeks ago

Bengaluru, Delhi, Mumbai, NCR, India Avom Consultants Full time ₹ 8,00,000 - ₹ 12,00,000 per year

Experience in Site Reliability Engineering, DevOps,managing teams, including mentoring and developing engineers.Prometheus, Grafana, ELK Stack, Splunk, Datadog, New Relic, AWS, GCP, Azure,Docker, Kubernetes,Python, Go, Bash, or simila.

Americas

Europe

Asia / Oceania

Africa

Lead Site Reliability Engineer