SRE Operations Engineering

1 week ago


Delhi, India McCain Foods Full time
JOB PURPOSE:As the Major Incident Manager will be responsible for overseeing the McCain's major incident management process with SRE, Automation driven thought leadership in the global technology, ensuring timely and effective response to significant disruptions or infrastructure incidents that impact business operations. Major incidents including and not limited to infrastructure, network, cloud and on-premise applications.

KEY RESPONSIBILITIES:Lead the McCain's major incident management process, including the identification, assessment, and resolution of significant disruptions or incidents affecting business operations.Establish and maintain predefined criteria and procedures for categorizing and prioritizing major incidents based on severity, impact, and urgency.Coordinate cross-functional response efforts during major incidents, working closely with internal teams, external vendors, and stakeholders to minimize downtime and restore services expeditiously.Serve as the primary point of contact and escalation for major incidents, providing regular updates and communication to stakeholders, including senior management, customers, and regulatory authorities.Conduct post-incident reviews and analysis to identify root causes, lessons learned, and opportunities for improvement in incident response procedures.Develop and maintain relationships with key stakeholders, including I&O teams, business units, and external partners, to facilitate effective incident response and resolution.Implement and maintain robust monitoring and alerting systems to proactively identify potential issues and mitigate risks before they escalate into major incidents.Provide guidance and support to incident response teams, including training, coaching, and knowledge sharing, to enhance their effectiveness and efficiency in managing major incidents.Participate in the development and implementation of business continuity and disaster recovery plans to ensure the organization's ability to respond to and recover from major incidents.Continuously work to improve problem identification and service restoration by leading and overseeing efforts to define, enhance, and deliver automated alerting and response systems with intelligent, self-healing capabilities.Continuously work to improve the reliability, stability, and performance of the Infrastructure and associated platforms by overseeing the implementation of fully automated telemetry, observation, & applied intelligence systems.Fulfill the role of Escalation Manager/Critical Incident Manager on major incidents by facilitating incident resolutions by leading teams through effective service restoration.Communicate and provide timely status and incident reports to Sr. Leadership.Collaborate with admins and platform engineers through implementation decisions to achieve highly reliable infrastructure, systems, and integrations.Lead conversations and provide business and engineering support for both in-house and external customers.Provide advanced Incident Management and Problem Management support to teams, to effectively identify, remediate, and resolve issues related to platform reliability, stability, and performance through careful analysis of telemetry data and system logs.Document all changes following controls, procedures and documentation standards and raises issues and concerns with recommendations for follow-up action.Partner with Site Reliability Engineering team to integrate and enhance monitoring and alerting systems that detect anomalies and potential incidents before they escalate.Partner with Observability team to co-develop incident resolution playbooks, detailing steps for common incident types, ensuring quick and effective responses.Partner with Site Reliability Engineering team to identify opportunities for automating incident response processes, such as automated rollback procedures or self-healing scripts.Implement and utilize automation tools available and recommended by the SRE team to streamline incident management processes.Drive Automation with Predictive Intelligence and AI for incident categorization, smart routing, AI-Driven RCA. and leverage clustering algorithms to group similar incidents, helping to identify common root causes and patterns.Partner with ServiceNow Platform team to drive and support adoption for platform automation and predictive intelligence capabilities.

KEY QUALIFICATION & EXPERIENCES:Bachelor's degree in computer science, information technology, or a related field.12+ years of IT Operations experience, minimum 5+ years of experience in incident management, and major incident management, in a complex environment in any global organization.10+ years’ of experience working in global organizations with the ability to effectively communicate with executives, leaders and individual contributors across the organization.5+ years of SRE experience working with

telemetry , observation, self-healing solutions, and platform automation.Experience with monitoring, logging & telemetry tools like New Relic, Splunk, ELK,

Nagios , SolarWinds, Prometheus, AWS Cloudwatch, Datadog, etc.Azure/AWS, Microsoft, RedHat, certifications and knowledge of ITIL/MOF practicesStrong technical expertise in areas of IT infrastructure, networking, security and applications support.Excellent communication and interpersonal skills, with the ability to effectively interact with stakeholders at all levels of the organization. Proven leadership and decision-making skills, with the ability to remain calm under pressure and make effective decisions in high-stress situations.Relevant industry certifications (e.g., ITIL, SRE, PMP, CISSP) are a plus.



  • delhi, India McCain Foods Full time

    JOB PURPOSE:As the Major Incident Manager will be responsible for overseeing the McCain's major incident management process with SRE, Automation driven thought leadership in the global technology, ensuring timely and effective response to significant disruptions or infrastructure incidents that impact business operations. Major incidents including and not...

  • SRE Engineer

    2 weeks ago


    New Delhi, India mccainfood Full time

       Position Title: SRE EngineerPosition Type: Regular - Full-Time ​Position Location: New Delhi Requisition ID: 30491   JOB PURPOSE:Reporting to the Sr Manager, DevSecOps & SRE, the Site Reliability Engineer will be responsible for: Site reliability engineers (SREs) are responsible for improving system reliability and resilience to make it faster...


  • delhi, India Max Life Insurance Company Limited Full time

    PositionAssistant Vice President – Site Reliability Engineering (SRE)Job SummaryResponsible for system performance & uptimes, IT Digital operations, maintaining and enhancing systems’ operational efficiency along with focus on deployment automation and system optimization, ensuring consistent performance and reliability. The candidate must have robust...


  • delhi, India Mancer Consulting Services Full time

    Director/Head of SRE (SME) – Pune or HyderabadPrincipal responsibilitiesThe Head of SRE Technology works with the Value Streams providing Site Reliability Engineering leadership, vision and direction as part of a global Production Integrity and SRE team.The Head of SRE will be accountable for:working with teams to develop ways of working,helping establish...

  • Lead Sre

    2 days ago


    Delhi, India CES Full time

    Key Skills and Competencies Required7+ years of extensive experience with Infrastructure as a Code (IaaC) and Desired State Configuration (DSC) tools such as Terraform, CDK and Chef6+ years of experience packaging, deploying, and managing containerized workloads running in common PaaS solutions (i.E. Docker, Kubernetes)6+ years expertise in managing AWS...

  • Senior Engineer

    4 weeks ago


    Delhi, India C&R Software Full time

    Job Description SummaryThe Cloud Operations team is accountable for the operational excellence of the C&R cloud platform, which hosts several business-critical, client-facing applications. The objective of the SRE within Cloud Operations is to coordinate a timely and focused organisational-wide response to severe/high-impact technical incidents airing from...


  • Delhi, India Rrootshell Technologiiss Pvt Ltd Full time

    #Dear Associates,#Hope you are doing well & Safe!#Greetings from Rrootshell#We are HIRING & URGENT Requirement for #Tibco or Boomi or Mulesoft Integration Engineer#This is for FULL -TIME role and Remote work opportunity in INDIAJob Description:Integration Platform:8+ years of experience as a Senior Integration Platform Engineer and Production...


  • Delhi, India Rrootshell Technologiiss Pvt Ltd Full time

    #Dear Associates,#Hope you are doing well & Safe!#Greetings from Rrootshell#We are HIRING & URGENT Requirement for #Tibco or Boomi or Mulesoft Integration Engineer#This is for FULL -TIME role and Remote work opportunity in INDIAJob Description:Integration Platform:8+ years of experience as a Senior Integration Platform Engineer and Production...


  • Delhi, India McCain Foods Full time

    JOB RESPONSIBILITIES:Work with stakeholders such as product owners and Engineering to define service level objectives (SLOs) for system operations.Track performance against SLOs in partnership with monitoring teams or other stakeholders, and ensure systems continue to meet SLOs over time.Create dashboards and reports to communicate key metrics.Create...


  • Delhi, India Integra Connect Full time

    About IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • Delhi, India Integra Connect Full time

    About IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...

  • Sr Mgr I&O

    3 weeks ago


    New Delhi, India mccainfood Full time

       Position Title: Sr Mgr I&O - Cloud, DevSecOps, SRE & ObsPosition Type: Regular - Full-Time ​Position Location: New Delhi Requisition ID: 31779   Our Global Technology team’s goal is to leverage technology and data to drive profitable growth, focus on enhancing customer experience and to further our purpose of 'Celebrating real connections...

  • DevOps Engineer

    4 weeks ago


    Delhi, India Vertex Agility Full time

    Developer DevOps EngineerRole Overview:As a Senior DevOps Engineer, you'll work within our Client Capabilities team, driving innovation through analytics, design thinking, and advanced technology. You'll collaborate with product managers, developers, and operations teams to build and maintain custom solutions following Site Reliability Engineering (SRE)...


  • Delhi, India Alp Consulting Ltd. Full time

    Experienced L3 SRE engineer based on business-critical SaaS applicationCapacity to L3 across the full stack including infra backend and front-end, before escalation to engineering business unitCapacity to automate SRE tools to provide proactive L3 support, close to our tech monitoring strategyCapacity to work under business pressure for business-critical...

  • Devops Engineer

    2 days ago


    Delhi, India Mastech Digital Full time

    Title: SRE/ DevOps EngineerLocation: 100% Remote WFH Role ( India )Work Hours: 1 PM IST – 10 PM IST ( Flexible )Job Summary:We are seeking an experienced SRE/DevOps Engineer to join our team. The ideal candidate will have over 10 years of strong background in site reliability engineering and DevOps practices, with expertise in Kubernetes, Terraform, cloud...

  • DevOps Engineer

    2 weeks ago


    delhi, India Mastech Digital Full time

    Title: SRE/ DevOps EngineerLocation: 100% Remote WFH Role ( India )Work Hours: 1 PM IST – 10 PM IST ( Flexible )Job Summary:We are seeking an experienced SRE/DevOps Engineer to join our team. The ideal candidate will have over 10 years of strong background in site reliability engineering and DevOps practices, with expertise in Kubernetes, Terraform, cloud...

  • Devops Engineer

    2 days ago


    Delhi, India Mastech Digital Full time

    Title: SRE/ DevOps EngineerLocation: 100% Remote WFH Role ( India )Work Hours: 1 PM IST – 10 PM IST ( Flexible )Job Summary:We are seeking an experienced SRE/DevOps Engineer to join our team. The ideal candidate will have over 10 years of strong background in site reliability engineering and DevOps practices, with expertise in Kubernetes, Terraform, cloud...


  • new delhi, India eJAmerica Full time

    Job Description:We are seeking a highly skilled and experiencedSenior Site Reliability Engineer (SRE)to join our dynamic team focused on security products and projects. The ideal candidate will have a strong background in Linux, Windows, and VMware environments, with proven expertise in Infrastructure as Code (IaC) using tools and languages such as Java,...


  • new delhi, India Evernorth Health Services Full time

    Qualifications:Required Skills:Strong oral and written communication skills, including presentation skills (MS Visio, MS PowerPoint, MS Word)Ability to prioritise, work independently with ambiguity and manage multiple assignmentsStrong conflict resolution skills with the ability to exercise mature judgementProven ability to analyse data, troubleshoot...

  • BE Engineer

    4 months ago


    delhi, India HyreFox Consultants Full time

    The Role: The ideal candidate will work on multiple projects that will impact the entire engineering organization. Adapting to change and adopting new technologies, they will constantly be solving new challenges as they arise. They are passionate about leading initiatives and delivering results. The work will be focused around enabling our engineering teams...