Senior Service Engineer
2 days ago
Are you passionate about cloud computing, obsessed with customer experience, and driven to resolve complex issues under pressure? Do you thrive in high-stakes, live environments and want to play a pivotal role in ensuring the reliability of Microsoft's cloud platform? If so, the Azure Customer Experience (CXP) team has the opportunity for you.
Microsoft Azure is one of the most exciting and strategic products at Microsoft—powering mission-critical workloads for enterprises, governments, and startups around the world. Azure delivers on-demand, hyper-scale infrastructure and platforms via Microsoft's global data centers, enabling customers to build, host, and scale their applications with confidence.
The Customer Reliability Engineering (CRE) team within Azure CXP is a top-level pillar of Azure Engineering responsible for world-class live-site management, customer reliability engagements, modern customer-first experiences for scale, and drives deep customer insights and empathy into the broader Azure Engineering organization. Our "no dead-end's" philosophy ensures that every customer, regardless of size or scale, can realize their full potential through the Microsoft Cloud
We are seeking decisive and experienced Service Engineers for Live Site Issues, Problem Management and driving Customer reliability space. This role is accountable for enhancing the customer experience across Azure, including First Party Services. The ideal candidate will demonstrate strong breadth in managing complex, highly available services, paired with deep technical expertise in Azure Core Services and their inter dependencies. You will work closely with Customers, First Parties, Customer Support, Livesite, and Engineering teams to deliver critical, customer-facing features. Success in this role requires the ability to influence and collaborate across many Azure servicing teams to ensure customer needs are met.
In addition, this role includes on-call responsibilities for managing and resolving complex multi-service outages. It requires the ability to remain effective under pressure, apply broad technical and analytical skills, and coordinate seamlessly with internal service teams and stakeholders. Strong communication skills—both written and verbal—are essential. You will also lead the evolution of Azure's Incident Management practice through Post-Incident Reviews, process development, and system automation. By leveraging telemetry and metrics, you will identify and drive platform-wide improvements with global impact. You'll be the single point of command and control during high-severity incidents, orchestrating cross-functional engineering, operations, and communications to minimize impact, restore services quickly, and protect the trust of our global customer base.
This role offers a unique opportunity to make immediate impact, improve systems at scale.
ResponsibilitiesTo be successful in this role, you must have a great track record of customer compassion, an engineering mindset, an innate aptitude for agility, and technical excellence in software engineering. Collaborate closely with Engineering/PM to ensure the availability, performance of Live Site and the satisfaction of our customers
- Lead and manage high-severity incidents across Azure services, serving as the single point of accountability to ensure rapid detection, triage, resolution, and customer communication.
- Act as the central authority during live site incidents, driving real-time decision-making and coordination across Engineering, Support, PM, Communications, and Field teams.
- Contribute to the design of V. Next architecture for Cloud infrastructure services, based on Customer/ First party engagements.
Engage in major production triage efforts and work with different teams in the identification of root cause of highly impactful or complex issues as required and identify Product gaps and work with Product teams to bridge the gaps. - Partner closely with Software developers, Product Managers, architects, and Infrastructure teams to drive delivery of sustainable and reusable design solution patterns to ensure non-functional production support requirements are adopted early in the Migration /Deployment
- Promote a customer-first culture by prioritizing availability, reliability, and platform trust in every response.
- Participate in the on-call rotation.
- Analyze customer-impacting signals from telemetry, support cases, and feedback to identify root causes, drive incident reviews (RCAs/PIRs), and implement preventative service improvements.
- Drive continuous improvement of the Azure platform by incorporating learnings from live site events and customer feedback, ensuring improved reliability, observability, and supportability.
- Collaborate closely with Engineering and Product teams to influence and implement service resiliency enhancements, auto-remediation tools, and customer-centric mitigation strategies.
- Identify and advocate for customer self-service capabilities, improved documentation, and scalable solutions that empower customers to resolve common issues independently.
- Design and drive adoption of incident response playbooks, mitigation levers, and operational frameworks aligned to real-world support scenarios and strategic customer needs.
- Contribute to the design of next-generation architecture for cloud infrastructure services with a focus on reliability and strategic customer support outcomes.
Build and maintain cross-functional partnerships, ensuring alignment across engineering, business, and support organizations. - Be data-driven and results-focused, using metrics to evaluate incident response effectiveness and platform health.
- Bring an engineering mindset to operational challenges, balancing agility, scalability, and technical excellence.
- Exhibit strong cross-team collaboration, engineering mindset, and results-oriented execution under pressure
Required Qualifications:
- 10+ Yrs of experience in roles cloud operations, incident response, SRE or large-scale system engineering preferably in platforms like Azure, AWS, or GCP.
- Extensive service engineering experience in always-on, zero-downtime enterprise environments, operating at global scale 24x7x365
- Exceptional command presence and executive-grade communication skills—able to impose clarity, direction, and alignment across customers, senior stakeholders, and third-party vendors in high-stakes, high-ambiguity situations
- Deep mastery of modern cloud architecture patterns, microservices design, and enterprise-grade container orchestration at scale
- Demonstrated ability to make critical, time-bound decisions under pressure, and with limited data—without compromising long-term reliability.
- Advanced proficiency with enterprise observability and monitoring ecosystems (Grafana, Prometheus, Datadog, Splunk, New Relic),
- Lead or significantly contribute to building AI-augmented observability frameworks to proactively predict, detect, and eliminate performance bottlenecks
- Expert-level knowledge of CI/CD automation pipelines, large-scale container orchestration (Kubernetes, Docker), and infrastructure as code solutions (Terraform, ARM, Bicep) for hyperscale deployments.
- Hands-on experience with AI/ML frameworks and production-grade cloud AI services, applying them to operational intelligence and automation
- Proven success deploying AI-driven monitoring, predictive alerting, and automated remediation systems in mission-critical environments
- Fluency in one or more automation languages (PowerShell, Python, CLI etc.)
- Deep understanding of ITIL and modern incident management frameworks, with a track record of evolving processes for agility and scale.
- Mastery of high availability architectures, disaster recovery strategies, business continuity planning, and advanced performance tuning for distributed systems.
- Demonstrates strategic thinking, quantitative and analytical skills, team leadership, and collaboration
- Excellent problem resolution, judgment, negotiating and decision-making skills
- Desired Strong knowledge of Windows Platform or Linux, developer tools and ability to diagnose and debug user code
- Proven ability to triage, prioritize, and execute multiple critical workstreams in alignment with strategic objectives under time constraints.
- Excellent communication skill (written + verbal) in English, especially in high-pressure scenarios.
- Ability to communicate with a variety of audiences; including high-profile customers, executive management, and engineering teams.
- Deep, hands-on expertise with Azure, AWS, or GCP core services, including the ability to architect and troubleshoot complex interdependent systems.
- Bachelor's or master's degree in computer science, Information Technology or equivalent experience
Preferred Qualifications:
- 10+ Years of demonstrated experience as an Incident Commander or Crisis Manager for critical, high-severity incidents in high-availability, distributed environments.
- Experience with SRE (Site Reliability Engineering) principles and practices.
- Advanced exposure to chaos engineering, systemic fault injection, and designing for failure-resilient, self-healing architectures
- AI/ML Experience: [Beginner to Intermediate]
- Familiarity with how AI/ML models are integrated into cloud infrastructure and their potential failure modes.
- Experience using AI-powered tools for incident analysis, log correlation, or predictive alerting.
- An understanding of the challenges and risks associated with AI/ML systems in a production environment.
- Certifications:
- Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure Solutions Architect, GCP Professional Cloud Architect).
- Certifications in ITIL, SRE, or other relevant frameworks.
Every day, our customers stake their business and reputation on our cloud. You can help #AzCXP provide our customers with the world-class cloud services they need to succeed. #azcre
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
-
Senior Enterprise Service Engineer
2 days ago
Hyderabad, Telangana, India e2open Full time ₹ 9,00,000 - ₹ 12,00,000 per yearPOSITION OVERVIEW The Senior Enterprise Service Engineer is a member of e2opens Enterprise Service Engineering team for delivering high quality Level 3 customer service to E2open's Supply Chain applications to meet our global customers' Service Level Agreements and operational requirements. The successful candidate will be adaptable, detail-oriented,...
-
Senior Service Engineer
1 week ago
Hyderabad, Telangana, India Microsoft Full time US$ 1,50,000 - US$ 2,00,000 per yearAre you passionate about cloud computing, obsessed with customer experience, and driven to resolve complex issues under pressure? Do you thrive in high-stakes, live environments and want to play a pivotal role in ensuring the reliability of Microsoft's cloud platform? If so, the Azure Customer Experience (CXP) team has the opportunity for you. Microsoft...
-
Senior Data Engineer
5 days ago
Hyderabad, Telangana, India beBeeDataEngineer Full time ₹ 15,00,000 - ₹ 21,90,000Senior Data EngineerWe are seeking a highly skilled and experienced Senior Data Engineer to join our dynamic team. As a senior member of the team, you will collaborate with business partners, IT, and external vendors to develop scalable and sustainable solutions that enhance HR processes and service delivery.
-
Senior ETAP Engineer
2 days ago
Hyderabad, Telangana, India VB® Engineering (I) Pvt Ltd Full time US$ 80,000 - US$ 1,20,000 per yearRole DescriptionThis is a full-time on-site role for a Senior ETAP Engineer located in Telangana, India. The Senior ETAP Engineer will be responsible for conducting power system studies, performing ETAP analysis, preparing technical reports, and providing engineering consulting services. They will also be expected to collaborate with team members, ensure...
-
Senior Service engineer
2 days ago
Hyderabad, Telangana, India Johnson Controls Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJob DescriptionJob Title:HVAC Chiller Service Engineer - TroubleshootingJob Summary:We are seeking an experienced HVAC Chiller Service Engineer to join our team. The ideal candidate will have expertise in diagnosing, troubleshooting, and repairing HVAC chiller systems, ensuring efficient operation and minimizing downtime. The role requires a strong technical...
-
Senior Enterprise Service Engineer
2 days ago
Hyderabad, Telangana, India e2open Full time ₹ 8,00,000 - ₹ 12,00,000 per yearE2open is the connected supply chain platform that enables the world'slargest companies to transform the way theymake, move, and sell goods and services. We connect more than 400,000 partners as one multi-enterprise network. Powered by the network, data, and applications, our SaaS platform anticipates disruptions andopportunities to help companies improve...
-
Senior Engineer
2 days ago
Hyderabad, Telangana, India MP TECHNOLOGIES Full time ₹ 1,04,000 - ₹ 1,30,878 per yearCompany DescriptionMP Technologies is a leading provider of innovative security solutions, specializing in electronic surveillance, access control, fire alarm systems, home automation, and data center security. Known for reliability and expertise, MP Technologies has become a trusted partner for businesses, organizations, and individuals seeking...
-
Senior Service Manager
4 weeks ago
Hyderabad, Telangana, India LSEG Full timeJob DescriptionRole ProfileThe Service Management group resides within the Corporate Engineering Reliability Engineering & Enablement Function. It provides operational end-to-end service ownership and support, and manages incident, problem and change processes.The Senior Service Manager is accountable for service management throughout the service lifecycle...
-
Senior Network Engineer
23 hours ago
Hyderabad, Telangana, India Evron Networks Full time ₹ 1,50,000 - ₹ 28,00,000 per yearCompany DescriptionEvron Networks Private Limited is an IT services company that provides comprehensive technology solutions tailored to today's evolving business landscape. Our team of skilled professionals uses advanced processes and technology to deliver integrated enterprise solutions that align business strategy with IT initiatives. With extensive...
-
Service Engineer
1 week ago
Hyderabad, Telangana, India Microsoft Full time US$ 1,50,000 - US$ 2,00,000 per yearAre you passionate about cloud computing, obsessed with customer experience, and driven to resolve complex issues under pressure? Do you thrive in high-stakes, live environments and want to play a pivotal role in ensuring the reliability of Microsoft's cloud platform? If so, the Azure Customer Experience (CXP) team has the opportunity for you. Microsoft...