Site Reliability Engineer

1 week ago


Gurgaon, Haryana, India Aerial Telecom Solutions (ATS) Full time ₹ 1,04,000 - ₹ 1,30,878 per year

Position Overview:

SRE- Lead will be responsible for managing a team of engineers focused on software deployments and site reliability engineering practices. The role will involve overseeing the deployment process of software applications and services, implementing automation, monitoring, and alerting tools, and ensuring the reliability, availability, and performance of critical systems and services. The Deployments and SRE Manager will collaborate closely with development, operations, and other stakeholders to drive a culture of DevOps and SRE, aiming to improve system stability, scalability, and resilience.

Key Responsibilities:

Leadership: Lead and mentor a team of engineers responsible for software deployments and SRE practices. Set clear expectations, provide coaching and feedback, and foster a collaborative and innovative team environment.

Deployment Management: Implement and manage the deployment process for software applications and services, including Monthly release management of AADL products, change management, and rollback procedures.

Drive continuous improvement in deployment processes and tools to increase efficiency and minimize risk.

Site Reliability Engineering: Implement best practices in site reliability engineering, including system monitoring, alerting, capacity planning, performance optimization, and incident management.

Collaborate with development teams to ensure application architectures are resilient and scalable, and drive the adoption of DevOps and SRE principles and practices.

Automation and Tooling: Evaluate, implement, and maintain relevant automation and tooling to streamline operational tasks, reduce manual effort, and improve system reliability.

This may include configuration management, containerization, and orchestration technologies, well versed with Blue Green and Canary Deployment Model.

Incident Management: Lead incident management efforts, including incident response, root cause analysis, and post-incident reviews. Collaborate with cross-functional teams to minimize impact and restore services as quickly as possible.

Implement preventive measures to avoid future incidents and drive continuous improvement in incident management processes.

Monitoring and Alerting:

Implement and maintain effective system monitoring and alerting tools to proactively detect and resolve issues.

Define and track key performance indicators (KPIs) and service level objectives (SLOs) to measure system reliability, performance, and availability.

Collaboration: Collaborate closely with development, operations, security, network and other stakeholders to ensure smooth operations and timely resolution of issues. Foster strong relationships and effective communication channels to promote collaboration and coordination.

Documentation: Maintain comprehensive documentation of deployment processes, system configurations, procedures, and incident reports. Ensure documentation is up-to-date, accurate, and accessible to relevant stakeholders.

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or related field.
  • Minimum of 7 years of experience in software engineering, DevOps, deployments, or site reliability engineering.
  • Strong technical skills in deployment processes and tools, such as release management, change management, and rollback procedures.
  • Proficient in scripting and automation using tools like Python, Bash, or PowerShell.
  • Solid understanding of DevOps principles, Agile methodologies, and ITIL practices.
  • Strong technical skills in CI/CD tools and practices, such as Jenkins, Git, Docker, Kubernetes, and related technologies.
  • Strong leadership skills with experience in managing and mentoring technical teams.
  • Excellent problem-solving, analytical, and communication skills.
  • Ability to work independently, prioritize tasks, and manage time effectively.
  • Experience with incident management tools and processes, such as ITIL Incident Management, and familiarity with ITSM frameworks.
  • In-depth knowledge of relational database management systems (RDBMS) such as Oracle, Microsoft SQL Server, MySQL, or PostgreSQL.
  • Knowledge of cloud computing platforms, preferably AWS is a plus.
  • Relevant certifications, such as AWS Certified DevOps Engineer, Kubernetes Certified Administrator, or Site Reliability Engineering (SRE) certifications, Grafana expertise are desirable


  • Gurgaon, Haryana, India ElevenX Capital Full time US$ 1,50,000 - US$ 2,00,000 per year

    About the Role:We are looking for a skilled Site Reliability Engineer (SRE) to join our team and help us ensure the reliability, scalability, and performance of our critical systems. As an SRE, you will work closely with development and operations teams to build and maintain highly available services, automate operational tasks, and monitor system health.Key...


  • Gurgaon, Haryana, India RBS Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Join us as a Site Reliability EngineerIn this key role, you'll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYou'll enjoy significant stakeholder interaction, working in...


  • Gurgaon, Haryana, India EDGE Executive Search Full time US$ 80,000 - US$ 1,20,000 per year

    The JobThe SRE is a global team that provides technical support across the suite of products. The team works closely with a highly competent Technical Operation Centre (TOC), Development and Infrastructure teams to deliver proactive tasks to improve the supportability of our platforms. Our work helps to ensure that the company provides a high-quality...


  • Gurgaon, Haryana, India Impronics Technologies Full time

    Job DescriptionRequired Skills & Experience:- 8+ years of overall experience in infrastructure engineering or SRE roles, with at least 3+ years in thepayments/fintech domain.- Strong understanding ofpayment protocols(UPI, IMPS, RTGS, NEFT, SWIFT, etc.) and transaction processing systems.- Proven expertise inLinux systems administration, cloud platforms (AWS,...


  • Gurgaon, Haryana, India Freecharge Full time US$ 90,000 - US$ 1,20,000 per year

    Job Title: Site Reliability Engineer (SRE)3 Years ExperienceAbout the Role:We are looking for a Site Reliability Engineer (SRE) with 3 years of experience to join our team. You will be responsible for ensuring the reliability, scalability, and efficiency of our production systems. This role requires a balance of software engineering, system administration,...


  • Gurgaon, Haryana, India Gemini Solutions Pvt Ltd Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Position SummaryIn this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications. Leveraging your strong technical skills and...


  • Gurgaon, Haryana, India RBS Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Join us as a Site Reliability EngineerIn this key role, you'll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and servicesYou'll enjoy significant stakeholder interaction, working in...


  • Gurgaon, Haryana, India Bravura Solutions Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Bravura's Commitment and MissionAt Bravura Solutions, collaboration, diversity and excellence matter. We value your ideas, giving you room to be curious and innovate in an exciting, fast-paced, and flexible environment. We look for many different skills and abilities, as well as how you can add value to Bravura and our culture.As a Global FinTech market...


  • Gurgaon, Haryana, India EDGE Executive Search Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    The JobThe SRE is a global team that provides technical support across the suite of products. The team works closely with a highly competent Technical Operation Centre (TOC), Development and Infrastructure teams to deliver proactive tasks to improve the supportability of our platforms. Our work helps to ensure that the company provides a high-quality...


  • Gurgaon, Haryana, India Leapwork Full time US$ 1,04,000 - US$ 1,30,878 per year

    At Leapwork, our vision is to break down the barriers between humans and computers through the world's most accessible automation platform. We are the leading global AI-powered visual test automation solution, enabling some of the world's largest enterprises to adopt, scale, and maintain automation – in under 30 days.In today's environment, where...