Current jobs related to SRE Operations Engineering - Gurugram - McCain Foods
-
SRE Engineer
2 weeks ago
Gurugram, India iO Associates - UKEU Full timeData Engineer Contract: 6-12 monthsLocation: Gurgaon (Onsite, 5 days a week)We're seeking a skilled Application Support/SRE Engineer with 3+ years of experience to join our team in Gurgaon on an onsite contract.Key Responsibilities:Provide L1/L2 support in a 24x7 environment.Perform basic SQL, Mongo queries, and Unix scripting (Shell, Python).Troubleshoot...
-
SRE Engineer
3 weeks ago
Gurugram, India iO Associates - UKEU Full timeData Engineer Contract: 6-12 monthsLocation: Gurgaon (Onsite, 5 days a week)We're seeking a skilled Application Support/SRE Engineer with 3+ years of experience to join our team in Gurgaon on an onsite contract.Key Responsibilities:Provide L1/L2 support in a 24x7 environment.Perform basic SQL, Mongo queries, and Unix scripting (Shell, Python).Troubleshoot...
-
Lead-SRE
3 weeks ago
gurugram, India Zupee Full timeAbout ZupeeWe are the biggest online gaming company with largest market share in the Indian gaming sector’s largest segment — Casual & Boardgame. We make skill-based games that spark joy in the everyday lives of people by engaging, entertaining, and enabling earning while at play.In the three plus years of existence, Zupee has been on a mission to...
-
Lead-SRE
1 month ago
gurugram, India Zupee Full timeAbout ZupeeWe are the biggest online gaming company with largest market share in the Indian gaming sector’s largest segment — Casual & Boardgame. We make skill-based games that spark joy in the everyday lives of people by engaging, entertaining, and enabling earning while at play.In the three plus years of existence, Zupee has been on a mission to...
-
Lead-SRE
1 month ago
Gurugram, India Zupee Full timeAbout ZupeeWe are the biggest online gaming company with largest market share in the Indian gaming sector’s largest segment — Casual & Boardgame. We make skill-based games that spark joy in the everyday lives of people by engaging, entertaining, and enabling earning while at play.In the three plus years of existence, Zupee has been on a mission to...
-
Lead-SRE
1 month ago
Gurugram, India Zupee Full timeAbout ZupeeWe are the biggest online gaming company with largest market share in the Indian gaming sector’s largest segment — Casual & Boardgame. We make skill-based games that spark joy in the everyday lives of people by engaging, entertaining, and enabling earning while at play.In the three plus years of existence, Zupee has been on a mission to...
-
Lead-SRE
2 weeks ago
gurugram, India Zupee Full timeAbout Zupee We are the biggest online gaming company with largest market share in the Indian gaming sector’s largest segment — Casual & Boardgame. We make skill-based games that spark joy in the everyday lives of people by engaging, entertaining, and enabling earning while at play. In the three plus years of existence, Zupee has been on a mission to...
-
Lead-SRE
4 weeks ago
Gurugram, India Zupee Full timeAbout Zupee We are the biggest online gaming company with largest market share in the Indian gaming sector’s largest segment — Casual & Boardgame. We make skill-based games that spark joy in the everyday lives of people by engaging, entertaining, and enabling earning while at play. In the three plus years of existence, Zupee has been on a mission to...
-
Lead Systems Engineer
5 months ago
Gurugram, India Epam Full timeDescription Join our organization as a Lead Systems Engineer (DevOps & SRE) and play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud...
-
Senior Manager Software Engineering
4 weeks ago
Gurugram, India UnitedHealth Group Full timeOptum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion,...
-
Sre Trainee
5 months ago
Gurugram, Haryana, India Srijan Technologies PVT LTD Full time**About us** We turn customer challenges into growth opportunities. Material is a global strategy partner to the world’s most recognizable brands and innovative companies. Our people around the globe thrive by helping organizations design and deliver rewarding customer experiences. We use deep human insights, design innovation and data to create...
-
American Express SRE Leader
3 weeks ago
Bangalore/Gurgaon/Gurugram, India American Expressprivate limited Full timeAbout the RoleWe're seeking a seasoned SRE leader to join our team at American Express. As a key member of our GRC technology team, you'll be responsible for developing and implementing a comprehensive SRE strategy that aligns with our company's goals and objectives.Key ResponsibilitiesDevelop and implement a comprehensive SRE strategy aligned with the...
-
American Express Senior SRE Leader
3 weeks ago
Bangalore/Gurgaon/Gurugram, India American Expressprivate limited Full timeAbout UsAt American Express, we're committed to backing our customers, communities, and each other. We believe that with the right support, people and businesses can progress in incredible ways.Job SummaryWe're seeking a seasoned SRE leader to join our team. As a key member of our GRC technology solutions team, you'll be responsible for developing and...
-
Senior Software Engineer I
2 months ago
Gurugram, India UnitedHealth Group Full timeOptum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion,...
-
Senior Site Reliability Engineer
2 months ago
Gurugram, India AMEX Full timeYou Lead the Way. Weve Got Your Back. With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, youll learn and grow as we help you create a career...
-
Reliability Engineering Lead
2 weeks ago
Bangalore/Gurgaon/Gurugram, India American Expressprivate limited Full timeCustomer Delight through Reliability EngineeringAt American Express, we're committed to providing our customers with the world's best experience. As a key member of our reliability engineering team, you'll play a critical role in driving this vision forward. Your expertise in SRE strategy and leadership will enable us to deliver seamless and reliable...
-
Senior DevOps Engineer
4 weeks ago
Gurugram, India Infra360 Solutions Pvt Ltd Full timeJob Title: Senior DevOps Engineer / SREDepartment: TechnologyLocation: GurgaonWork Mode: On-siteWorking Hours: 10 AM - 7 PM Terms: PermanentExperience: 4-6 yearsEducation: B.Tech/MCANotice Period:...
-
Senior DevOps Engineer
2 weeks ago
Bangalore/Gurgaon/Gurugram, India American Expressprivate limited Full timeEmbark on a Journey of Reliability and InnovationAt American Express, we're committed to providing exceptional customer experiences. As the Senior DevOps Engineer - Reliability Lead, you'll play a critical role in shaping our technology landscape. With a focus on reliability engineering best practices, you'll help us build a culture of resilience and...
-
Senior Reliability Engineer
3 weeks ago
Bangalore/Gurgaon/Gurugram, India American Expressprivate limited Full timeAbout the RoleWe're seeking a seasoned Senior Reliability Engineer to join our team at American Express. As a key member of our SRE team, you'll be responsible for developing and implementing a comprehensive SRE strategy aligned with our company's goals and objectives.Key ResponsibilitiesDevelop and implement a comprehensive SRE strategy aligned with the...
-
American Express
1 month ago
Bangalore/Gurgaon/Gurugram, IN American Expressprivate limited Full timeYou Lead the Way. We've Got Your Back. With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, you'll learn and grow as we help you create...
SRE Operations Engineering
2 months ago
JOB PURPOSE:
As the Major Incident Manager will be responsible for overseeing the McCain's major incident management process with SRE, Automation driven thought leadership in the global technology, ensuring timely and effective response to significant disruptions or infrastructure incidents that impact business operations. Major incidents including and not limited to infrastructure, network, cloud and on-premise applications.
KEY RESPONSIBILITIES:
- Lead the McCain's major incident management process, including the identification, assessment, and resolution of significant disruptions or incidents affecting business operations.
- Establish and maintain predefined criteria and procedures for categorizing and prioritizing major incidents based on severity, impact, and urgency.
- Coordinate cross-functional response efforts during major incidents, working closely with internal teams, external vendors, and stakeholders to minimize downtime and restore services expeditiously.
- Serve as the primary point of contact and escalation for major incidents, providing regular updates and communication to stakeholders, including senior management, customers, and regulatory authorities.
- Conduct post-incident reviews and analysis to identify root causes, lessons learned, and opportunities for improvement in incident response procedures.
- Develop and maintain relationships with key stakeholders, including I&O teams, business units, and external partners, to facilitate effective incident response and resolution.
- Implement and maintain robust monitoring and alerting systems to proactively identify potential issues and mitigate risks before they escalate into major incidents.
- Provide guidance and support to incident response teams, including training, coaching, and knowledge sharing, to enhance their effectiveness and efficiency in managing major incidents.
- Participate in the development and implementation of business continuity and disaster recovery plans to ensure the organization's ability to respond to and recover from major incidents.
- Continuously work to improve problem identification and service restoration by leading and overseeing efforts to define, enhance, and deliver automated alerting and response systems with intelligent, self-healing capabilities.
- Continuously work to improve the reliability, stability, and performance of the Infrastructure and associated platforms by overseeing the implementation of fully automated telemetry, observation, & applied intelligence systems.
- Fulfill the role of Escalation Manager/Critical Incident Manager on major incidents by facilitating incident resolutions by leading teams through effective service restoration.
- Communicate and provide timely status and incident reports to Sr. Leadership.
- Collaborate with admins and platform engineers through implementation decisions to achieve highly reliable infrastructure, systems, and integrations.
- Lead conversations and provide business and engineering support for both in-house and external customers.
- Provide advanced Incident Management and Problem Management support to teams, to effectively identify, remediate, and resolve issues related to platform reliability, stability, and performance through careful analysis of telemetry data and system logs.
- Document all changes following controls, procedures and documentation standards and raises issues and concerns with recommendations for follow-up action.
- Partner with Site Reliability Engineering team to integrate and enhance monitoring and alerting systems that detect anomalies and potential incidents before they escalate.
- Partner with Observability team to co-develop incident resolution playbooks, detailing steps for common incident types, ensuring quick and effective responses.
- Partner with Site Reliability Engineering team to identify opportunities for automating incident response processes, such as automated rollback procedures or self-healing scripts.
- Implement and utilize automation tools available and recommended by the SRE team to streamline incident management processes.
- Drive Automation with Predictive Intelligence and AI for incident categorization, smart routing, AI-Driven RCA. and leverage clustering algorithms to group similar incidents, helping to identify common root causes and patterns.
- Partner with ServiceNow Platform team to drive and support adoption for platform automation and predictive intelligence capabilities.
KEY QUALIFICATION & EXPERIENCES:
- Bachelor's degree in computer science, information technology, or a related field.
- 12+ years of IT Operations experience, minimum 5+ years of experience in incident management, and major incident management, in a complex environment in any global organization.
- 10+ years’ of experience working in global organizations with the ability to effectively communicate with executives, leaders and individual contributors across the organization.
- 5+ years of SRE experience working with telemetry, observation, self-healing solutions, and platform automation.
- Experience with monitoring, logging & telemetry tools like New Relic, Splunk, ELK, Nagios, SolarWinds, Prometheus, AWS Cloudwatch, Datadog, etc.
- Azure/AWS, Microsoft, RedHat, certifications and knowledge of ITIL/MOF practices
- Strong technical expertise in areas of IT infrastructure, networking, security and applications support.
- Excellent communication and interpersonal skills, with the ability to effectively interact with stakeholders at all levels of the organization. Proven leadership and decision-making skills, with the ability to remain calm under pressure and make effective decisions in high-stress situations.
- Relevant industry certifications (e.g., ITIL, SRE, PMP, CISSP) are a plus.