Site Reliability Engineer

3 weeks ago


Noida, Uttar Pradesh, India HCLTech Full time

Job Title: Site Reliability Engineer (SRE) - LEAD

Department: COE

Job Summary:

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our engineering team. As an SRE, you will be responsible for ensuring the reliability, availability, and performance of our systems and services. You will work closely with development and operations teams to build scalable infrastructure, automate processes, and respond to incidents effectively.

Key Responsibilities:

Strategic Leadership & Governance

  • Define and evolve the SRE CoE vision, strategy, and roadmap.
  • Establish enterprise-wide SRE standards, frameworks, and maturity models.
  • Drive adoption of SRE principles across product and platform teams.

Enablement

  • Act as a subject matter expert and advisor to engineering teams on reliability, scalability, and performance.
  • Conduct workshops, training sessions, and knowledge-sharing forums.
  • Promote a culture of observability, automation, and continuous improvement.

Collaboration & Mentorship

  • Partner with engineering, product, and operations leaders to align reliability goals with business outcomes.
  • Mentor SREs and engineers across teams, fostering a community of practice.
  • Lead cross-functional reliability reviews and architecture assessments.
  • Collaborate with development, operations, and network teams.
  • Align infrastructure reliability with application SLOs/SLIs.
  • Advocate for best practices in system architecture and operations.

Infrastructure & Reliability

  • Design, implement, and maintain scalable, reliable infrastructure.
  • Ensure high availability and disaster recovery strategies.
  • Improve reliability for legacy and hybrid (cloud/on-prem) systems.

Monitoring & Incident Management

  • Develop and maintain monitoring, alerting, and incident response systems.
  • Conduct root cause analysis and post-mortems.
  • Participate in on-call rotations and respond to production issues.

Automation & Efficiency

  • Automate repetitive tasks using scripting and tooling.
  • Lead Infrastructure-as-Code (IaC) and automation for provisioning and scaling.
  • Create sustainable systems through automation and continuous improvement.
  • Evaluate and recommend tools for monitoring, alerting, incident management, and chaos engineering.
  • Build reusable automation frameworks and templates for onboarding teams to SRE practices.
  • Collaborate with DevOps and platform teams to integrate reliability tooling into CI/CD pipeline
  • Support rigorous testing and release procedures.

Performance & Capacity

  • Lead capacity planning, system upgrades, and OS patching.
  • Gather and analyze system/application metrics for performance tuning.

Containerization & Cloud

  • Support Kubernetes and container platforms in hybrid environments.
  • Work with OpenShift, GCP, Azure and AWS for cloud-integrated services.

Required Qualifications:

  • Bachelor's degree in computer science, Engineering, or a related field (or equivalent experience).
  • 8+ years of experience in SRE.
  • Proficiency in at least one programming/scripting language (e.g., Python).
  • Experience with cloud platforms (AWS, GCP, Azure).

Preferred Qualifications:

  • Experience in setting up or leading a CoE or similar strategic function.
  • Certifications in cloud, DevOps, or SRE-related domains.
  • Experience with chaos engineering and resilience testing.
  • Experience with observability tools (Prometheus, Grafana, ELK, Datadog, etc.).
  • Experience with incident management and SLO/SLI/SLA frameworks.


  • Noida, Uttar Pradesh, India CorroHealth Full time

    Hiring AlertWe are looking for highly skilled Lead Site Reliability Engineer (SRE) for our Product Development team based out at Noida LocationOnly Immediate Joiners preferredJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and...


  • Noida, Uttar Pradesh, India CorroHealth Full time

    Hiring AlertWe are looking for highly skilled Lead Site Reliability Engineer (SRE) for our Product Development team based out at Noida LocationOnly Immediate Joiners preferredJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and...


  • Noida, Uttar Pradesh, India CorroHealth Full time

    Hiring AlertWe are looking for highly skilled Lead Site Reliability Engineer (SRE) for our Product Development team based out at Noida LocationOnly Immediate Joiners preferredJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and...


  • Noida, Uttar Pradesh, India CorroHealth Full time

    Hiring AlertWe are looking for highly skilled Lead Site Reliability Engineer (SRE) for our Product Development team based out at Noida LocationOnly Immediate Joiners preferredJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and...


  • Noida, Uttar Pradesh, India CorroHealth Full time

    Hiring AlertWe are looking for highly skilled Lead Site Reliability Engineer (SRE) for our Product Development team based out at Noida LocationOnly Immediate Joiners preferredJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and...


  • Noida, Uttar Pradesh, India CorroHealth Full time

    Hiring AlertWe are looking for highly skilled Lead Site Reliability Engineer (SRE) for our Product Development team based out at Noida LocationOnly Immediate Joiners preferredJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and...


  • Noida, Uttar Pradesh, India CorroHealth Full time

    Hiring Alert We are looking for highly skilled Lead Site Reliability Engineer (SRE) for our Product Development team based out at Noida Location Only Immediate Joiners preferred Job Description We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering...


  • Noida, Uttar Pradesh, India beBeeReliable Full time

    Reliable System Engineer PositionWe are seeking an experienced Reliable System Engineer to join our team. As a key member of our infrastructure group, you will be responsible for ensuring the reliability and scalability of our systems.Key Responsibilities:Monitoring & Alerting: Design and implement monitoring and alerting systems using Datadog to proactively...


  • Noida, Uttar Pradesh, India Microsoft Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Do you want to work on a product that is used by millions of people around the world daily, and growing rapidly? Do you care deeply about how software is designed with a focus on supporting global scale? Do you want to be part of a world-class team that continuously pushes the boundary of service and engineering excellence?The Web Experience and Services...


  • Noida, Uttar Pradesh, India Celsior Full time

    This individual will play a crucial, client-facing role in Application Performance Monitoring (APM), User Experience Monitoring (UEM), and Site Reliability Engineering (SRE) solutions, translating client requirements into scalable and effective implementations. Valid Dynatrace certification is mandatory. Take complete charge of the Dynatrace Architecture,...