HCL - Lead Site Reliability Engineer - Incident Management & Application Monitoring
3 weeks ago
We are seeking a talented Lead Site Reliability Engineer (SRE) with a focus on incident management and application monitoring to join our dynamic team. The ideal candidate will have a strong background in both software engineering and systems administration, with a passion for ensuring the reliability, scalability, and performance of our systems and applications. This role requires proactive monitoring, rapid incident response, and continuous improvement of our monitoring and alerting infrastructure.
Key Responsibilities :
Incident Management :
- Lead the incident response process, including detection, escalation, resolution, and post-mortem analysis.
- Coordinate with cross-functional teams to diagnose and resolve critical incidents in a timely manner.
- Develop and maintain incident response runbooks and escalation procedures.
- Implement improvements to incident management processes to reduce mean time to resolution (MTTR) and minimize service disruptions.
Application Monitoring :
- Design, deploy, and maintain monitoring solutions for applications, infrastructure, and services.
- Define key performance indicators (KPIs) and service level objectives (SLOs) for monitoring.
- Develop custom metrics, dashboards, and alerts to provide actionable insights into system health and performance.
- Continuously evaluate and enhance monitoring tools and methodologies to ensure effectiveness and relevance.
Automation and Tooling :
- Develop automation scripts and tools to streamline incident response, monitoring, and maintenance tasks.
- Implement self-healing mechanisms and automated remediation for common issues.
- Leverage infrastructure as code (IaC) principles to automate deployment and configuration of monitoring infrastructure.
- Capacity Planning and Performance Optimization:
- Collaborate with engineering teams to forecast capacity requirements and plan for scaling our infrastructure and applications.
- Conduct performance analysis and optimization to ensure optimal resource utilization and cost efficiency.
- Identify bottlenecks and areas for improvement through performance monitoring and analysis.
Documentation and Knowledge Sharing :
- Document incident response procedures, troubleshooting steps, and best practices.
- Share knowledge and mentor team members on incident management and monitoring techniques.
- Contribute to the development of internal documentation, guides, and training materials.
Requirements :
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent work experience.
- Proven experience as a Site Reliability Engineer or similar role, with a focus on incident management and application monitoring.
- Proficiency in programming/scripting languages such as Python, Go, or Bash.
- Hands-on experience with monitoring tools such as Prometheus, Grafana, Datadog, or similar.
- Familiarity with incident management tools and processes, such as PagerDuty or OpsGenie.
- Strong understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes).
- Excellent troubleshooting and problem-solving skills, with a meticulous attention to detail.
- Effective communication skills and the ability to collaborate with cross-functional teams.
- Experience with infrastructure as code (IaC) tools like Terraform or CloudFormation is a plus.
- Relevant certifications such as AWS Certified DevOps Engineer or Google Professional Cloud DevOps Engineer are advantageous.
Benefits :
- Competitive salary and performance-based bonuses.
- Comprehensive health benefits package.
- Flexible work schedule and remote work options.
- Continuous learning and professional development opportunities.
- Collaborative and inclusive work environment with a focus on innovation and excellence.
- Join our team and play a key role in ensuring the reliability and performance of our mission-critical systems and applications
-
Site Reliability Engineer Lead
3 weeks ago
Chennai, Tamil Nadu, India Corpxcel Consulting Full timeJob Description :- For SRE coach we need someone with 10+ yrs of exp (female candidates requirement)- Have extensive experience as an agile coach with good knowledge of SRE- Super communication skill- Document the SRE manual and training material- Who can lead/coach the SRE team- Establish the processes, and work with multiple teams evangelising the...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Ford Business Solutions Full timeShort Description:A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal Candidates Strong background in software development and systems...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Corpxcel Consulting Full timeFor SRE :- Have experience in automation- Operational Knowledge in any of the CICD Tooling Technologies- Understanding of the cloud deployments and SRE- 5-8 years of solid, diverse work experience in a Java development and DevOps Platform Engineering with Development Disciplines in a high pace Production Environment- At least 3 years of experience with Java...
-
Middleware Site Reliability Engineer
4 hours ago
Chennai, Tamil Nadu, India NUSTAR TECHNOLOGIES INDIA PRIVATE LIMITED Full timeJob Description :- Understanding of connection mechanism in application.- Like application to application/DB.- Understanding of logs and analysis.- Should be able to evaluate issues and decide which of them get priority, modify existing software and documentation, - Train other members on any changes made and implement a plan for future improvements.-...
-
Incident Manager
3 weeks ago
Bangalore/Chennai, Tamil Nadu, India ALP Consulting Full timeResponsibilities and Challenges : - Record and classify received Incidents and undertake an immediate effort in order to restore a failed IT Service as quickly as possible;. - Conducts escalation to service teams, senior management and leaders to ensure appropriate awareness, engagement and focus;. - Leveraging technology to issue all communications and...
-
Site Reliability Engineer
3 weeks ago
Chennai, Tamil Nadu, India Mo Full timeJob Description : - Experience leading a small of engineers/SREs, working in an onsite/offshore model.- 5+ experience with working distributed microservice architecture/message queues with strong programming/system fundamentals- Strong Experience working with Database (Postgres/mysql, NoSql dbs)- 5+ years of experience in Cloud environment such as AWS, GCP...
-
Database Engineer
4 hours ago
Chennai, Tamil Nadu, India NUSTAR TECHNOLOGIES INDIA PRIVATE LIMITED Full timeJob Description :- Serve as the primary subject matter expert on database technologies and architectures.- Collaborate with development teams to design and implement database schemas and structures.- Provide guidance on database performance tuning and optimization strategies.- Conduct thorough analysis of database systems to identify areas for improvement...
-
Lead DevSecOps Engineer
3 weeks ago
Chennai, Tamil Nadu, India Freelancer Recruiter Full timeJob Description :- 5-8 years of solid, diverse work experience in a DevOps Platform Engineering with Development Disciplines in a high pace Production Environment- Bachelor's degree in Technical/Systems discipline or related experience required- Proven Understanding with Cloud deployments (Private Cloud / AWS / Azure / Docker/ Kubernetes)- Platform or...
-
Novacis Digital
1 week ago
Chennai, Tamil Nadu, India Novacis Digital Full timeJob Description :We are seeking a skilled DevOps Lead Engineer with 6 to 10 yrs. of experience who handles the entire DevOps lifecycle and is accountable for the implementation of the process. A DevOps Lead Engineer is liable for automating all the manual tasks for developing and deploying code and data to implement continuous deployment and continuous...
-
SRE Engineer Senior
7 days ago
Chennai, Tamil Nadu, India FIS Global Full timePosition Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor's Degree Travel Percentage : 0%Job Title: SRELocation: IndiaSite Reliability EngineerWe are the FIS Financial Intelligence team, and our mission is to enable financial businesses across the world to protect every financial transaction. We...
-
Development Manager Sr
4 weeks ago
Chennai, Tamil Nadu, India FIS Global Full timePosition Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor of Computer Science Travel Percentage : 1 - 5%We are the FIS Financial Intelligence team, and our mission is to enable financial businesses across the world to protect every financial transaction. We are developing a cutting-edge platform...
-
Bangalore/Chennai, Tamil Nadu, India CGI Information Systems and Management Consultants Full timeLooking for experienced AWS Data Engineer / Developer / Lead proficient in Lambda or Kinesis or Redshift, adept at architecting and deploying scalable serverless applications. Specialized in real-time data processing, data warehousing, and API development, with a strong focus on security, optimization, and collaborative problem-solving.Job Title : AWS Data...
-
Civil Site Engineer
3 weeks ago
Chennai, Tamil Nadu, India Saisource Solutions Full timeWe have immediate requirements for the post of Civil Site Engineer profile with well knowledge freshers as well as experienced candidates.Experience : 0 - 3 YearsNo. of Openings : 10Education : Any Bachelor DegreeRole : Civil Site EngineerIndustry Type : Real Estate / Property / ConstructionGender : [ Male / Female ]Job Country : India
-
Chennai, Tamil Nadu, India Vestas Wind Technology India Pvt Ltd Full timeApply for Engineer / Senior Engineer Condition Monitoring, Career Progress Consultants in Chennai for 4 - 9 Year of Experience on
-
Senior Application Developer
3 weeks ago
Chennai/Bangalore, Tamil Nadu, India Domniclewis Full timeIntermediate Applications Developer (PowerApps with Canvas App)Exp : 5 - 10 years of experience (Mandate)Skills Required :- Canvas app in PowerApps (Mandate)- Power Automate (Mandate)- Power BI (Optional)- Model Driven and Data verse experience(Optional)- Excellent Communication Skills (Mandate)- Good AttitudeIndividual Contributor RoleJob Description :-...
-
Engineering Manager
2 weeks ago
Chennai, Tamil Nadu, India Talent Syndicate Private Limited Full timeWhat you will do: - Collaborate closely with business, product, and engineering teams to translate goals into clear and actionable engineering roadmaps. - Lead the charge on defining and executing strategic engineering plans for your assigned areas, ensuring efficient project delivery. - Champion a culture of excellence by establishing and maintaining robust...
-
Sr Manager
2 weeks ago
Chennai, Tamil Nadu, India timesjobs Full timeSr Manager - Network Security PlatformDate: 21 Aug 2023Location: Chennai, IndiaCompany: Tata CommunicationsJob Family Descriptor:Create medium longterm optimal cost-effective scalable network capacity plans and provide innovative solutions for managing capacity requirementsIdentify future backbone network requirements to meet requirements for all lines of...
-
Senior DevOps Engineer
3 weeks ago
Chennai, Tamil Nadu, India HNM Solutions Full timeWe are hiring : Alibaba cloud - Senior DevOps engineer Location : @Remote (Only South Indian)Total Experience : 6 +years#responsibilities : Cloud Infrastructure Management : - Design, implement, and maintain cloud infrastructure on Alibaba Cloud.- Ensure the scalability, performance, and security of cloud environments. Service Deployment and Optimization :...
-
Lead Engineer, Murex
4 weeks ago
Chennai, Tamil Nadu, India NatWest Group Full timeJoin us as a Lead EngineerThis is an opportunity for a driven Lead Engineer to join us and support the technical delivery of a software engineering teamYou'll be responsible for developing solution design options and explaining the pros and cons to key stakeholders for appropriate decision makingHone your existing technical skills and advance your career in...
-
Application Support
2 weeks ago
Chennai, Tamil Nadu, India timesjobs Full timeApplication SupportLocation: ChennaiRole Definition:Responsible for ensuring the smooth operation and functionality of clients application. The primary role is to provide technical assistance, troubleshoot issues, and maintain application to minimize downtime.Key Responsibilities:Provide L1 support to global users.Classification of reported incidents, taking...