Service Engineering II
1 day ago
Are you passionate about cloud computing, obsessed with customer experience, and driven to resolve complex issues under pressure? Do you thrive in high-stakes, live environments and want to play a pivotal role in ensuring the reliability of Microsoft's cloud platform? If so, the Azure Customer Experience (CXP) team has the opportunity for you.
Microsoft Azure is one of the most exciting and strategic products at Microsoft—powering mission-critical workloads for enterprises, governments, and startups around the world. Azure delivers on-demand, hyper-scale infrastructure and platforms via Microsoft's global data centers, enabling customers to build, host, and scale their applications with confidence.
The Customer Reliability Engineering (CRE) team within Azure CXP is a top-level pillar of Azure Engineering responsible for world-class live-site management, customer reliability engagements, modern customer-first experiences for scale, and drives deep customer insights and empathy into the broader Azure Engineering organization. Our "no dead-end's" philosophy ensures that every customer, regardless of size or scale, can realize their full potential through the Microsoft Cloud
We are seeking decisive and experienced Service Engineers for Live Site Issues, Problem Management and driving Customer reliability space. This role is accountable for enhancing the customer experience across Azure, including First Party Services. The ideal candidate will demonstrate strong breadth in managing complex, highly available services, paired with deep technical expertise in Azure Core Services and their inter dependencies. You will work closely with Customers, First Parties, Customer Support, Livesite, and Engineering teams to deliver critical, customer-facing features. Success in this role requires the ability to influence and collaborate across many Azure servicing teams to ensure customer needs are met.
In addition, this role includes on-call responsibilities for managing and resolving complex multi-service outages. It requires the ability to remain effective under pressure, apply broad technical and analytical skills, and coordinate seamlessly with internal service teams and stakeholders. Strong communication skills—both written and verbal—are essential. You will also lead the evolution of Azure's Incident Management practice through Post-Incident Reviews, process development, and system automation. By leveraging telemetry and metrics, you will identify and drive platform-wide improvements with global impact. You'll be the single point of command and control during high-severity incidents, orchestrating cross-functional engineering, operations, and communications to minimize impact, restore services quickly, and protect the trust of our global customer base.
This role offers a unique opportunity to make immediate impact, improve systems at scale.
ResponsibilitiesTo be successful in this role, you must have a great track record of customer compassion, an engineering mindset, an innate aptitude for agility, and technical excellence in software engineering. Collaborate closely with Engineering/PM to ensure the availability, performance of Live Site and the satisfaction of our customers
- Manage high-severity incidents (SEV0/SEV1/SEV2) across Azure services, serving as the single point of accountability to ensure rapid detection, triage, resolution, and customer communication.
- Act as the central authority during live site incidents, driving real-time decision-making and coordination across Engineering, Support, PM, Communications, and Field teams.
- Provide calm, decisive leadership in crisis situations.
- Promote a customer-first culture by prioritizing availability, reliability, and platform trust in every response.
- Participate in the on-call rotation.
- Analyze customer-impacting signals from telemetry, support cases, and feedback to identify root causes, drive incident reviews (RCAs/PIRs), and implement preventative service improvements.
- Drive continuous improvement of the Azure platform by incorporating learnings from live site events and customer feedback, ensuring improved reliability, observability, and supportability.
- Collaborate closely with Engineering and Product teams to influence and implement service resiliency enhancements, auto-remediation tools, and customer-centric mitigation strategies.
- Identify and advocate for customer self-service capabilities, improved documentation, and scalable solutions that empower customers to resolve common issues independently.
- Contribute to the development and adoption of incident response playbooks, mitigation levers, and operational frameworks aligned to real-world support scenarios and strategic customer needs
- Contribute to the design of next-generation architecture for cloud infrastructure services with a focus on reliability and strategic customer support outcomes.
- Build and maintain cross-functional partnerships, ensuring alignment across engineering, business, and support organizations.
- Be data-driven and results-focused, using metrics to evaluate incident response effectiveness and platform health.
- Bring an engineering mindset to operational challenges, balancing agility, scalability, and technical excellence.
- Exhibit strong cross-team collaboration, engineering mindset, and results-oriented execution under pressure
Required Qualifications:
- 6+ Yrs of experience in roles cloud operations, incident response, SRE or large-scale system engineering preferably in platforms like Azure, AWS, or GCP.
- Must have Service Engineering experience in a 24 x 7 x 365 enterprise environments
- Exceptional command-and-control communication skills—able to drive clarity and direction with customers - internal Microsoft stake holders and third-party vendors during ambiguity and chaos.
- Deep understanding of cloud architecture patterns, microservices, and containerization.
- Demonstrated ability to make decisions quickly, under pressure, and with limited data—without compromising long-term reliability.
- Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Datadog, Splunk, New Relic).
- Contribute to Implement observability frameworks to proactively detect performance bottlenecks.
- Strong knowledge of CI/CD pipelines, container orchestration (Kubernetes, Docker), and infrastructure as code (Terraform, ARM, Bicep).
- Familiarity with AI/ML frameworks and cloud AI services.
- Experience implementing AI-driven monitoring, alerting, and remediation systems
- Fluency in one or more automation languages (PowerShell, Python, CLI etc.)
- Understanding ITIL or other incident management frameworks is a must.
- Understand High Availability, Disaster Recovery, Business Continuity, Performance Tuning
- Demonstrates strategic thinking, quantitative and analytical skills, team leadership, and collaboration
- Excellent problem resolution, judgment, negotiating and decision-making skills
- Desired Strong knowledge of Windows Platform or Linux, developer tools and ability to diagnose and debug user code
- Effectively manage and prioritize multiple tasks in accordance with high level objectives/projects.
- Excellent communication skill (written + verbal) in English, especially in high-pressure scenarios.
- Ability to communicate with a variety of audiences; including high-profile customers, executive management, and engineering teams.
- Experience with Azure, AWS, or GCP core services and their interdependence.
- Bachelor's or master's degree in computer science, Information Technology or equivalent experience
Preferred Qualifications:
- 6+ Years of demonstrated experience as an Incident Commander or Crisis Manager for critical, high-severity incidents in high-availability, distributed environments.
- Experience with SRE (Site Reliability Engineering) principles and practices.
- Exposure to chaos engineering, fault injection, or high availability architecture.
- AI/ML Experience: [Beginner to Intermediate]
- Familiarity with how AI/ML models are integrated into cloud infrastructure and their potential failure modes.
- Experience using AI-powered tools for incident analysis, log correlation, or predictive alerting.
- An understanding of the challenges and risks associated with AI/ML systems in a production environment.
- Certifications:
- Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure Solutions Architect, GCP Professional Cloud Architect).
- Certifications in ITIL, SRE, or other relevant frameworks.
Every day, our customers stake their business and reputation on our cloud. You can help #AzCXP provide our customers with the world-class cloud services they need to succeed. #azcre
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
-
Software Engineer II
6 days ago
Hyderabad, Telangana, India Microsoft Full time US$ 1,20,000 - US$ 1,50,000 per yearThe Windows Cloud division is looking for a Software Engineer II (SWE II) that will help us take the Windows Cloud platform, as well as the Windows 365 Cloud PC and Azure Virtual Desktop business to the next level.Windows 365 Cloud PC (W365) and Azure Virtual Desktop (AVD) have recently been recognized as leaders in the Gartner Magic Quadrant for Desktop as...
-
Software Engineer II
14 hours ago
Hyderabad, Telangana, India Storable Full time ₹ 9,00,000 - ₹ 12,00,000 per yearSoftware Engineer - II (4 Years) Job Summary: We are looking for a skilled Software Engineer - II with experience in generative AI technologies. The successful candidate will be involved in developing AI driven applications and optimizing AI models to enhance performance and user experience. Responsibilities:Develop and maintain software applications...
-
Software Engineer II
7 days ago
Hyderabad, Telangana, India Microsoft Full time ₹ 15,00,000 - ₹ 20,00,000 per yearAre you passionate about shaping the future applications of AI and empowering millions of users to unlock their full potential? The OneNote team is at the forefront of an exciting transformation with Copilot Notebooks: intelligent, dynamic notebooks infused with powerful AI that act as a true "second brain." Imagine effortlessly capturing ideas,...
-
Software Engineer II
6 days ago
Hyderabad, Telangana, India Microsoft Full time ₹ 15,00,000 - ₹ 20,00,000 per yearWith the Microsoft Cloud Security team, we take immense pride in developing a diverse set of security products and services that are leaders in their respective market segments. Our innovative solutions have set new industry standards, earning global recognition safeguarding critical infrastructure at the highest scale. Microsoft Defender for Cloud is a...
-
Software Engineer II
1 week ago
Hyderabad, Telangana, India Microsoft Full time US$ 1,20,000 - US$ 2,00,000 per yearSharePoint helps millions of people work better together and empowers the biggest companies in the world to solve mission critical problems. We create and operate global scale services to store, secure and manage critical and most sensitive data on the planet.As a Software Engineer II we will have fantastic opportunities and are on the front-line of making...
-
Software Development Engineer II, REX
1 day ago
Hyderabad, Telangana, India Amazon Full time US$ 1,50,000 - US$ 2,00,000 per yearDescriptionDo you want to influence the experience of millions of customers? Do you want to work in a collaborative environment that impacts products from across the company? The Recipient Experience (REX) team owns and build customer-facing experiences which help us realize our mission of providing a perfectly executed, transparent, and flexible delivery...
-
Software Engineer II
6 days ago
Hyderabad, Telangana, India Microsoft Full time ₹ 5,00,000 - ₹ 10,00,000 per yearAre you an experienced Software Engineer II with a passion for building high scale microservices? Do you excel in collaborating with Architects, Product management, and Data disciplines? Do you thrive on solving complex and ambiguous challenges? If so, come join usWe are the Microsoft Store team, part of the Windows Experiences organization. Our team's...
-
Android Engineer II
2 weeks ago
Hyderabad, Telangana, India Microsoft Full time ₹ 15,00,000 - ₹ 20,00,000 per yearThe Windows Connected Experiences team is looking for a highly motivated and innovative Software Engineer II to break new ground as we take our products to orders of magnitude higher scale and rock-solid reliability, build out the intelligence capabilities to dramatically deepen user engagement and create a great cross-device experience. The team is...
-
Software Engineer II
6 days ago
Hyderabad, Telangana, India Microsoft Full time US$ 80,000 - US$ 1,20,000 per yearMicrosoft is on a mission to build platforms and products that create and complete magical experiences across Microsoft, to empower every person and organization to achieve more. As part of that mission, Microsoft Devices Software team is on a journey to create new experiences on the Windows platform. A fundamental part of our strategy is having desirable...
-
Software Engineer II
3 weeks ago
Hyderabad, Telangana, India Chase Bank Full timeJob DescriptionYou're ready to gain the skills and experience needed to grow within your role and advance your career - and we have the perfect software engineering opportunity for you.As a Software Engineer II at JPMorgan Chase within the Consumer and Community Banking- CBC technologyt, you are part of an agile team that works to enhance, design, and...