HCL - Lead Site Reliability Engineer - Incident Management & Application Monitoring

3 weeks ago


Chennai, Tamil Nadu, India HCL Technologies Full time

We are seeking a talented Lead Site Reliability Engineer (SRE) with a focus on incident management and application monitoring to join our dynamic team. The ideal candidate will have a strong background in both software engineering and systems administration, with a passion for ensuring the reliability, scalability, and performance of our systems and applications. This role requires proactive monitoring, rapid incident response, and continuous improvement of our monitoring and alerting infrastructure.

Key Responsibilities :


Incident Management :

  • Lead the incident response process, including detection, escalation, resolution, and post-mortem analysis.
  • Coordinate with cross-functional teams to diagnose and resolve critical incidents in a timely manner.
  • Develop and maintain incident response runbooks and escalation procedures.
  • Implement improvements to incident management processes to reduce mean time to resolution (MTTR) and minimize service disruptions.

Application Monitoring :

  • Design, deploy, and maintain monitoring solutions for applications, infrastructure, and services.
  • Define key performance indicators (KPIs) and service level objectives (SLOs) for monitoring.
  • Develop custom metrics, dashboards, and alerts to provide actionable insights into system health and performance.
  • Continuously evaluate and enhance monitoring tools and methodologies to ensure effectiveness and relevance.

Automation and Tooling :

  • Develop automation scripts and tools to streamline incident response, monitoring, and maintenance tasks.
  • Implement self-healing mechanisms and automated remediation for common issues.
  • Leverage infrastructure as code (IaC) principles to automate deployment and configuration of monitoring infrastructure.
  • Capacity Planning and Performance Optimization:
  • Collaborate with engineering teams to forecast capacity requirements and plan for scaling our infrastructure and applications.
  • Conduct performance analysis and optimization to ensure optimal resource utilization and cost efficiency.
  • Identify bottlenecks and areas for improvement through performance monitoring and analysis.

Documentation and Knowledge Sharing :

  • Document incident response procedures, troubleshooting steps, and best practices.
  • Share knowledge and mentor team members on incident management and monitoring techniques.
  • Contribute to the development of internal documentation, guides, and training materials.

Requirements :

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
  • Proven experience as a Site Reliability Engineer or similar role, with a focus on incident management and application monitoring.
  • Proficiency in programming/scripting languages such as Python, Go, or Bash.
  • Hands-on experience with monitoring tools such as Prometheus, Grafana, Datadog, or similar.
  • Familiarity with incident management tools and processes, such as PagerDuty or OpsGenie.
  • Strong understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes).
  • Excellent troubleshooting and problem-solving skills, with a meticulous attention to detail.
  • Effective communication skills and the ability to collaborate with cross-functional teams.
  • Experience with infrastructure as code (IaC) tools like Terraform or CloudFormation is a plus.
  • Relevant certifications such as AWS Certified DevOps Engineer or Google Professional Cloud DevOps Engineer are advantageous.

Benefits :

  • Competitive salary and performance-based bonuses.
  • Comprehensive health benefits package.
  • Flexible work schedule and remote work options.
  • Continuous learning and professional development opportunities.
  • Collaborative and inclusive work environment with a focus on innovation and excellence.
  • Join our team and play a key role in ensuring the reliability and performance of our mission-critical systems and applications
(ref:hirist.tech)

  • Chennai, Tamil Nadu, India Corpxcel Consulting Full time

    Job Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...


  • Chennai, Tamil Nadu, India Ford Business Solutions Full time

    Short Description:A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal Candidates Strong background in software development and systems...


  • Chennai, Tamil Nadu, India Corpxcel Consulting Full time

    For SRE :- Have experience in automation- Operational Knowledge in any of the CICD Tooling Technologies- Understanding of the cloud deployments and SRE- 5-8 years of solid, diverse work experience in a Java development and DevOps Platform Engineering with Development Disciplines in a high pace Production Environment- At least 3 years of experience with Java...


  • Chennai, Tamil Nadu, India NUSTAR TECHNOLOGIES INDIA PRIVATE LIMITED Full time

    Job Description :- Understanding of connection mechanism in application.- Like application to application/DB.- Understanding of logs and analysis.- Should be able to evaluate issues and decide which of them get priority, modify existing software and documentation, - Train other members on any changes made and implement a plan for future improvements.-...

  • Incident Manager

    3 weeks ago


    Bangalore/Chennai, Tamil Nadu, India ALP Consulting Full time

    Responsibilities and Challenges : - Record and classify received Incidents and undertake an immediate effort in order to restore a failed IT Service as quickly as possible;. - Conducts escalation to service teams, senior management and leaders to ensure appropriate awareness, engagement and focus;. - Leveraging technology to issue all communications and...


  • Chennai, Tamil Nadu, India Mo Full time

    Job Description : - Experience leading a small of engineers/SREs, working in an onsite/offshore model.- 5+ experience with working distributed microservice architecture/message queues with strong programming/system fundamentals- Strong Experience working with Database (Postgres/mysql, NoSql dbs)- 5+ years of experience in Cloud environment such as AWS, GCP...

  • Database Engineer

    4 hours ago


    Chennai, Tamil Nadu, India NUSTAR TECHNOLOGIES INDIA PRIVATE LIMITED Full time

    Job Description :- Serve as the primary subject matter expert on database technologies and architectures.- Collaborate with development teams to design and implement database schemas and structures.- Provide guidance on database performance tuning and optimization strategies.- Conduct thorough analysis of database systems to identify areas for improvement...


  • Chennai, Tamil Nadu, India Freelancer Recruiter Full time

    Job Description :- 5-8 years of solid, diverse work experience in a DevOps Platform Engineering with Development Disciplines in a high pace Production Environment- Bachelor's degree in Technical/Systems discipline or related experience required- Proven Understanding with Cloud deployments (Private Cloud / AWS / Azure / Docker/ Kubernetes)- Platform or...

  • Novacis Digital

    1 week ago


    Chennai, Tamil Nadu, India Novacis Digital Full time

    Job Description :We are seeking a skilled DevOps Lead Engineer with 6 to 10 yrs. of experience who handles the entire DevOps lifecycle and is accountable for the implementation of the process. A DevOps Lead Engineer is liable for automating all the manual tasks for developing and deploying code and data to implement continuous deployment and continuous...

  • SRE Engineer Senior

    7 days ago


    Chennai, Tamil Nadu, India FIS Global Full time

    Position Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor's Degree Travel Percentage : 0%Job Title: SRELocation: IndiaSite Reliability EngineerWe are the FIS Financial Intelligence team, and our mission is to enable financial businesses across the world to protect every financial transaction. We...


  • Chennai, Tamil Nadu, India FIS Global Full time

    Position Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor of Computer Science Travel Percentage : 1 - 5%We are the FIS Financial Intelligence team, and our mission is to enable financial businesses across the world to protect every financial transaction. We are developing a cutting-edge platform...


  • Bangalore/Chennai, Tamil Nadu, India CGI Information Systems and Management Consultants Full time

    Looking for experienced AWS Data Engineer / Developer / Lead proficient in Lambda or Kinesis or Redshift, adept at architecting and deploying scalable serverless applications. Specialized in real-time data processing, data warehousing, and API development, with a strong focus on security, optimization, and collaborative problem-solving.Job Title : AWS Data...

  • Civil Site Engineer

    3 weeks ago


    Chennai, Tamil Nadu, India Saisource Solutions Full time

    We have immediate requirements for the post of Civil Site Engineer profile with well knowledge freshers as well as experienced candidates.Experience : 0 - 3 YearsNo. of Openings : 10Education : Any Bachelor DegreeRole : Civil Site EngineerIndustry Type : Real Estate / Property / ConstructionGender : [ Male / Female ]Job Country : India


  • Chennai, Tamil Nadu, India Vestas Wind Technology India Pvt Ltd Full time

    Apply for Engineer / Senior Engineer Condition Monitoring, Career Progress Consultants in Chennai for 4 - 9 Year of Experience on


  • Chennai/Bangalore, Tamil Nadu, India Domniclewis Full time

    Intermediate Applications Developer (PowerApps with Canvas App)Exp : 5 - 10 years of experience (Mandate)Skills Required :- Canvas app in PowerApps (Mandate)- Power Automate (Mandate)- Power BI (Optional)- Model Driven and Data verse experience(Optional)- Excellent Communication Skills (Mandate)- Good AttitudeIndividual Contributor RoleJob Description :-...

  • Engineering Manager

    2 weeks ago


    Chennai, Tamil Nadu, India Talent Syndicate Private Limited Full time

    What you will do: - Collaborate closely with business, product, and engineering teams to translate goals into clear and actionable engineering roadmaps. - Lead the charge on defining and executing strategic engineering plans for your assigned areas, ensuring efficient project delivery. - Champion a culture of excellence by establishing and maintaining robust...

  • Sr Manager

    2 weeks ago


    Chennai, Tamil Nadu, India timesjobs Full time

    Sr Manager - Network Security PlatformDate: 21 Aug 2023Location: Chennai, IndiaCompany: Tata CommunicationsJob Family Descriptor:Create medium longterm optimal cost-effective scalable network capacity plans and provide innovative solutions for managing capacity requirementsIdentify future backbone network requirements to meet requirements for all lines of...


  • Chennai, Tamil Nadu, India HNM Solutions Full time

    We are hiring : Alibaba cloud - Senior DevOps engineer Location : @Remote (Only South Indian)Total Experience : 6 +years#responsibilities : Cloud Infrastructure Management : - Design, implement, and maintain cloud infrastructure on Alibaba Cloud.- Ensure the scalability, performance, and security of cloud environments. Service Deployment and Optimization :...

  • Lead Engineer, Murex

    4 weeks ago


    Chennai, Tamil Nadu, India NatWest Group Full time

    Join us as a Lead EngineerThis is an opportunity for a driven Lead Engineer to join us and support the technical delivery of a software engineering teamYou'll be responsible for developing solution design options and explaining the pros and cons to key stakeholders for appropriate decision makingHone your existing technical skills and advance your career in...

  • Application Support

    2 weeks ago


    Chennai, Tamil Nadu, India timesjobs Full time

    Application SupportLocation: ChennaiRole Definition:Responsible for ensuring the smooth operation and functionality of clients application. The primary role is to provide technical assistance, troubleshoot issues, and maintain application to minimize downtime.Key Responsibilities:Provide L1 support to global users.Classification of reported incidents, taking...