Urgent: Incident Response Engineer

4 weeks ago

Chennai, Tamil Nadu, India Centific Full time

Job Description

Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystemcomprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 marketsto create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.

Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets.

About Job

- Role Title: Incident Response Engineer
- Role Overview: As an Incident Response Engineer at Centific, you will be responsible for handling and mitigating critical system incidents, ensuring minimal downtime and rapid recovery of services. This role involves working with cross-functional teams to detect, analyze, and resolve incidents efficiently. You will be required to improve incident handling processes, develop automated response strategies, and maintain documentation of all incidents and resolutions. Your expertise in managing real-time operational incidents and post-incident analysis will play a critical role in maintaining system stability and business continuity.
- This is a hands-on role that requires deep knowledge of incident response frameworks, system troubleshooting, security monitoring, and automated remediation.
- Key Responsibilities:
- Incident Detection & Monitoring:
- Implement real-time incident detection using tools like PagerDuty/Opsgenie/VictorOps for on-call alerting and escalations.
- Monitor system health, logs, and telemetry using Splunk/Elastic Stack (ELK)/Sentry/Grafana Loki to identify early warning signs of system failures.
- Configure and fine-tune SIEM solutions (Splunk/Graylog/Wazuh) for log-based security and operational threat detection.

Why Join Centific

- High-Impact Role: Be at the forefront of mitigating critical system incidents and ensuring business continuity.
- Cutting-Edge Technology: Work with modern automation, monitoring, and security tools.
- Global Exposure: Collaborate with teams supporting enterprise-scale infrastructure worldwide.
- Career Growth: Access to security certifications, SRE training, and industry-leading upskilling programs.
- Work-Life Balance: Hybrid work model, shift flexibility, and wellness programs.
- Skills:
- Ability to remain calm under pressure and manage incidents in high-stress environments.
- Ownership and accountability in resolving incidents from detection to closure, including post-mortem analysis.
- Strong coordination skills to communicate incident status clearly with engineers, leadership, and external teams.
- Process-oriented thinking to follow structured incident response playbooks and continuously improve workflows.
- Ability to make rapid decisions in time-sensitive scenarios to minimize downtime and mitigate risks.
- Good-to-have Qualifications:
- Certifications: GIAC Certified Incident Handler (GCIH), AWS Certified Security Specialist, or Certified Information Systems Security Professional (CISSP).
- Threat Hunting & Detection: Knowledge of MITRE ATT&CK framework and threat intelligence integration.
- Chaos Engineering: Hands-on experience with Gremlin/LitmusChaos for incident testing and resilience validation.
- Network Troubleshooting: Understanding of packet analysis, firewall logs, and intrusion detection systems (IDS/IPS).
- Disaster Recovery Planning: Experience in business continuity planning and disaster recovery (BCP/DR) testing.

Must-Have Qualifications:

- Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Experience: 3+ years of hands-on experience in incident response, system monitoring, and operational troubleshooting.
- Monitoring & Alerting Expertise: Proficiency with PagerDuty/Opsgenie/VictorOps/Splunk/Elastic Stack (ELK)/Sentry/Grafana Loki.
- Incident Response & RCA: Experience conducting root cause analysis (RCA) and post-mortem reviews.
- Automation & Scripting: Hands-on experience with Python/Bash/Ansible to develop automation scripts for incident resolution.
- Security Incident Handling: Familiarity with SIEM tools (Splunk/Graylog/Wazuh) and forensic analysis tools (TheHive/Velociraptor).
- CI/CD & Incident Remediation: Understanding of automated rollback strategies, self-healing systems, and deployment remediation.

Collaboration & Training:

- Coordinate incident response drills and tabletop exercises to improve team readiness.
- Train operational teams in incident detection, escalation, and response best practices.
- Work closely with SRE, DevOps, and Observability Engineers to optimize response workflows and improve system observability.
- Ensure compliance such as GDPR, HIPAA, and ISO 27001 standards in incident handling and logging.
- Implement threat intelligence feeds to stay ahead of emerging security threats.
- Security Incident Response & Compliance:
- Work with security teams to investigate and mitigate security-related incidents.
- Conduct forensic analysis on compromised systems and logs using TheHive/Velociraptor/Splunk SOAR.
- Automated Remediation & Incident Prevention:
- Develop self-healing automation using Ansible/Python/Bash to proactively remediate common failures.
- Implement automated rollback and recovery mechanisms within CI/CD pipelines to reduce impact during deployments.
- Integrate AI-driven anomaly detection to proactively detect and prevent potential failures before they escalate.
- Test and deploy chaos engineering tools (Gremlin/LitmusChaos) to validate system resilience under stress conditions.
- Root Cause Analysis & Post-Incident Review:
- Conduct post-mortem analysis and root cause analysis (RCA) for all major incidents.
- Work closely with SRE and security teams to identify persistent failure patterns and recommend long-term fixes.
- Document all incident reports, mitigation steps, and RCA findings to enhance organizational learning and incident prevention.
- Improve incident classification to differentiate between operational failures, security breaches, and performance degradation.
- Incident Response & Mitigation:
- Respond to critical incidents in a 24/7 shift rotation, ensuring minimal downtime and quick service recovery.
- Follow standardized Incident Response Playbooks to handle various system failures, security incidents, and infrastructure outages.
- Develop and maintain incident triage and escalation processes, ensuring clear handoffs between teams.
- Implement runbook automation to execute predefined mitigation steps for common incidents.

Incident Manager

1 week ago

Chennai, Tamil Nadu, India Qode Full time US$ 90,000 - US$ 1,20,000 per year

What you'll be doing: Reporting to Global Service Delivery Manager in support of the Incident & Problem processes and procedures Closely collaborate with the Global Service Delivery Manager, Service Support Manager and regional Incident Managers to increase service stability by identifying all improvement opportunities in the operational area. Identify...
Cybersecurity Incident Manager

3 days ago

Chennai, Tamil Nadu, India beBeeIncident Full time ₹ 9,00,000 - ₹ 12,00,000

Incident Management SpecialistThis is a challenging role that requires the ability to manage and coordinate incident response activities.Key Responsibilities:Support cyber incident response actions to ensure proper assessment, containment, mitigation and documentation.Perform in-depth analysis and investigative efforts when events are escalated and determine...
Security Incident Responder

4 weeks ago

Chennai, Tamil Nadu, India Wpp Groups Full time

Job DescriptionWPP is a world leader in marketing services, with deep AI, data and technology capabilities, global presence and unrivalled creative talent. Our clients include many of the biggest companies and advertisers in the world, including approximately 300 of the Fortune Global 500.Our people are the key to our success. We're committed to fostering a...
Incident Management Coordinator

2 weeks ago

Chennai, Tamil Nadu, India TECEZE Full time

Job Title: Incident Management Coordinator – L1 Department: IT Operations / Service Desk Reports To: Incident Manager or IT Operations Lead Location: Remote/Hybrid Role Purpose: The Incident Management Coordinator (L1) is responsible for monitoring, logging, categorizing, and prioritizing incidents in alignment with the ITIL framework. This role acts...
Senior Incident Manager

4 weeks ago

Chennai, Tamil Nadu, India Wpp Groups Full time

Job DescriptionWPP is a world leader in marketing services, with deep AI, data and technology capabilities, global presence and unrivalled creative talent. Our clients include many of the biggest companies and advertisers in the world, including approximately 300 of the Fortune Global 500.Our people are the key to our success. We're committed to fostering a...
Incident Management Specialist

16 hours ago

Chennai, Tamil Nadu, India beBeeIncident Full time ₹ 15,00,000 - ₹ 25,00,000

Job Overview:We are seeking a seasoned Incident Management Specialist to lead our incident response efforts.Key Responsibilities:Analyze and facilitate swift resolution to incidents impacting end-usersProvide expert guidance on complex incidents, including strategic communicationsCollaborate with IT groups and business partners to drive service restoration,...
Crisis Response Leader

3 days ago

Chennai, Tamil Nadu, India beBeeCommunication Full time ₹ 1,30,00,000 - ₹ 1,85,00,000

Major Incident Resolution LeadA leading liner shipping organization seeks a skilled Major Incident Manager to oversee the resolution of critical incidents and ensure timely communication with stakeholders.Required skills include strategic thinking, problem-solving, excellent communication, and strong analytical skills.Qualifications necessary are a degree in...
Lead Incident Management

3 weeks ago

Chennai, Tamil Nadu, India Olam International Full time

Job Description- Support cyber incident response actions to ensureproper assessment, containment, mitigation and documentation- Perform in-depth analysis and investigative effortswhen events are escalated and determine next appropriatecontainment / remediation / eradication efforts.- Research and Evaluate new technologies like Anti APTsolutions, SOAR,...
Walk in Urgent Requirement For Production Engineer

3 weeks ago

Chennai, Tamil Nadu, India green success infotech Full time

Walk in Urgent Requirement For Production Engineer JD/: Hiring Production Engineer for an Automobile industry Qualification /: Diploma / BE Mechanical Experience /: 0 /- 5 years Work Location /: Chennai /(Ambattur/) Work timings /: 9am to 5pm Regards, Nedhra /-HR All the best
Urgent Requirement Quality Engineer

3 weeks ago

Chennai, Tamil Nadu, India GS infotech Full time

Urgent Requirement Quality Engineer Full job description Job Type/: Permanent Pay/: 14k to 30k per month Benefits/: Food provided Provident Fund Schedule/: Day shift Morning shift Rotational shift Supplemental Pay/: Yearly bonus Work Location/: In person We are seeking a detail oriented Quality Engineer to ensure that products meet established quality...

Americas

Europe

Asia / Oceania

Africa

Urgent: Incident Response Engineer