Senior Site Reliability Engineer
4 days ago
Join the Azure Specialized AI Infrastructure team in India to drive advancements in Artificial Intelligence AI and support high-performance infrastructure for generative AI workloads As a Senior SRE you will automate and maintain large-scale distributed systems powering latest AI applications and machine learning models Your primary focus will be on the reliability scalability and performance of AI infrastructure ensuring seamless operations for mission-critical AI services The role emphasizes a start-up mindset collaboration and customer advocacy Microsoft s mission is to empower every person and every organization on the planet to achieve more As employees we come together with a growth mindset innovate to empower others and collaborate to realize our shared goals Each day we build on our values of respect integrity and accountability to create a culture of inclusion where everyone can thrive at work and beyond Responsibilities Reliability Ensure the reliability scalability and security of AI infrastructure supporting HPC AI workloads Incident Management Lead incident response root cause analysis and continuous improvement to minimize downtime and optimize service availability Performance Optimization Identify and resolve bottlenecks in compute storage networking and specialized hardware GPUs InfiniBand to enhance AI system performance Infrastructure Automation Develop and maintain automation tools for deployment monitoring predictive analysis and management of AI infrastructure including containerized environments Kubernetes Docker Technical Leadership Provide technical guidance in cloud and AI infrastructure technologies collaborating with cross-functional teams to drive innovation and best practices Customer Advocacy Act as a customer advocate focusing on service excellence and live site reliability for AI workloads Research Innovation Stay informed on emerging AI infrastructure technologies and industry trends recommending adoption where beneficial Qualifications Required Minimum Qualifications 6 years technical experience in software engineering network engineering or systems administration OR Bachelor s Degree in Computer Science Information Technology or related field AND 3 years technical experience in software engineering network engineering or systems administration OR Master s Degree in Computer Science Information Technology or related field AND 2 years technical experience in software engineering network engineering or systems administration 5 years of hands-on experience developing and supporting infrastructure services for AI or cloud platforms Proven ability to modify componentized well-architected infrastructure software and collaborate across teams 1 years experience with incident management and reliability engineering in cloud or AI environments Excellent interpersonal communication and collaboration skills Other Requirements Ability to meet Microsoft customer and or government security screening requirements are required for this role These requirements include but are not limited to the following specialized security screenings Microsoft Cloud Background Check This position will be required to pass the Microsoft Cloud Background Check upon hire transfer and every two years thereafter Additional or Preferred Qualifications 7 years technical experience in software engineering network engineering OR systems administration OR Bachelor s Degree in Computer Science Information Technology OR related field AND 4 years technical experience in software engineering network engineering OR systems administration OR Master s Degree in Computer Science Information Technology OR related field AND 3 years technical experience in software engineering network engineering Experience in distributed systems and or cloud platforms Azure Kubernetes Docker containers ecosystem Experience with GPUs InfiniBand or similar high-performance technologies Proficiency in RDMA Remote Direct Memory Access MPI Message Passing Interface and high-performance computing architecture Proficient in scripting PowerShell Shell script etc and deep expertise in Linux Microsoft is an equal opportunity employer All qualified applicants will receive consideration for employment without regard to age ancestry color family or medical care leave gender identity or expression genetic information marital status medical condition national origin physical or mental disability political affiliation protected veteran status race religion sex including pregnancy sexual orientation or any other characteristic protected by applicable laws regulations and ordinances We also consider qualified applicants regardless of criminal histories consistent with legal requirements If you need assistance and or a reasonable accommodation due to a disability during the application or the recruiting process please send a request via the azurecorejobs
-
Senior Site Reliability Engineer
7 days ago
Hyderabad, Telangana, India Microsoft Full timeThe Windows Cloud division is looking for a Senior Site Reliability Engineer that will help us take the Windows Cloud platform as well as the Windows 365 Cloud PC and Azure Virtual Desktop business to the next level Windows 365 Cloud PC W365 and Azure Virtual Desktop AVD have recently been recognized as leaders in the Gartner Magic Quadrant TM for Desktop as...
-
Senior Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Jade Global Software Pvt Ltd Full time ₹ 12,00,000 - ₹ 24,00,000 per yearSenior Site Reliability Engineer (SRE) – Datadog ObservabilitySenior Site Reliability Engineer (SRE) – Datadog Observability1 Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Cubic Full timeBusiness Unit Cubic Transportation Systems Company Details When you join Cubic you become part of a company that creates and delivers technology solutions in transportation to make people s lives easier by simplifying their daily journeys and defense capabilities to help promote mission success and safety for those who serve their nation Led by our talented...
-
Senior Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Jade Global Full time ₹ 12,00,000 - ₹ 24,00,000 per yearSenior Site Reliability Engineer (SRE) – Datadog Observability1Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an...
-
Senior Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India Jade Global Full time ₹ 1,00,00,000 - ₹ 2,00,00,000 per yearJob Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog Location: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE...
-
Senior Site Reliability Engineer
1 week ago
Hyderabad, Telangana, India Instaresz Business Services Pvt Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per yearJob Title: Senior Site Reliability Engineer (SRE)Experience Required:10+ YearsLocation:Hyderabad (On-site)Employment Type:Full-TimeAbout InstareszInstaresz Business Services Pvt. Ltd. focuses on building and scalinghigh-performance SaaSproductswith expertise in:• SaaS Product Development• Infrastructure & DevOps• Data & Analytics• AI & AutomationOur...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, India SID Global Solutions Full timeJob Role: Site Reliability Engineer (SRE) – GCP Experience: 3+ yearsLocation: HyderabadAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API Management,...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, India Whatjobs IN C2 Full timeJob Role: Site Reliability Engineer (SRE) – GCP Experience: 3+ years Location: Hyderabad About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API Management,...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, India SID Global Solutions Full timeJob Role: Site Reliability Engineer (SRE) – GCP Experience: 3+ years Location: Hyderabad About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience, CMS, API Management,...
-
Site Reliability Engineering- Lead/ Senior
17 hours ago
Hyderabad, Telangana, India Sonata Software Full time ₹ 20,00,000 - ₹ 25,00,000 per yearSite Reliability Engineering- Lead/ SeniorFull Time (Hybrid)HydRequired Skills & Experience:Experience in Site Reliability Engineering, DevOps, or related Infrastructure Engineering roles.Expertise in Kubernetes and cloud platforms, especially AWS.Solid understanding of large-scale distributed systems.Proficient with Linux systems, networking, and storage...