Senior Site Reliability Engineer

5 months ago


Hyderabad, India Microsoft Full time

Overview

Do you have a passion for high scale services and working with some of Microsoft’s most critical cloud capabilities? We’re looking for a Senior Site Relability Engineer with the right mix of software development, Cloud experience and passion for quality to envision, design, and deliver solutions for Microsoft's cloud Infrastructure.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.Are you looking to be at the forefront of Microsoft’s cloud computing transformation? Are you looking to work in an agile environment that ships frequently while maintaining a focus on long-term bets? Do you want to work with state of the art distributed systems that deal with near real time detections on petabyte scale telemetry using Machine Learning and traditional software to deliver on Cloud Availability and Safety goals. Do you want to make an impact in a team of talented engineers delivering world class Software solutions?

Microsoft Cloud Operations & Innovation (CO+I) is the engine that powers Microsoft cloud services through the operation of our unified global datacenters enabling ~30% of Microsoft revenue through Commercial Cloud ($38 billion in FY20 Q1). The Cloud Infrastructure Health team in CO+IE is focused on improving Cusomer Availability, Data center Safety, Capacity and helping optimize the utilization of Datacenter resources using telemetry and Insights. Our systems analyze petabyte scale telemetry data from Datacenter critical environments and secondary signals in near real time and offline that enable timesensitive insights directly impacting Cloud Operations.Our team is looking for an experienced, competent, and motivated Senior SRE . The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our customers and users.

SRE participates in the service design aspects of Cloud Infrastructure Health system and takes primary responsibility for developing code, scripts, systems, and/or tools that reduce operational burden by automating complex and repetitive tasks such as onboarding of system capabilities to newer data centers and upkeep of system capabilities in the existing sites . The SRE enables feature teams to increase the velocity at which they can safely deploy changes to production, and monitor the effects of changes across the footprint. SRE analyzes telemetry data to develop capacity planning models, identify patterns and trends that drive continuous improvement, and highlight opportunities to deploy automation to monitor and manage CIH services across sites. SRE also participates in on-call rotations to resolve live site incidents, minimize customer impact, and document solutions and insights that inform ongoing improvements to infrastructure, code, tools, and/or processes that prevent the recurrence of similar issues.

Qualifications

Required/Minimum Qualifications:

6+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications:

7+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
OR Doctorate Degree in Computer Science, Information Technology, or related field.
Familiarity with one or more general purpose programming languages including but not limited to: Java, C/C++, C#, Python, JavaScript, PowerShell
Experience with the Microsoft cloud and/or stack including:
O365, Azure, Windows or other Microsoft software/service
Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns
Demonstrated ability to debug, fix, and optimize code
Full-stack troubleshooting skills across network, application, hardware, management fabric, and distributed services layers
Excellent communications skills, both verbal and written

#COIcareers

#COIEngCareers

#COIE_DIODEcareers

Responsibilities

Design, develop, and deliver the required software engineering that reduce operational burden by automating complex and repetitive tasks such as onboarding of system capabilities to newer data centers and upkeep of system capabilities in the existing sites Own deployment, availability, reliability, performance and customer escalation targets for Critical Environment Telemetry solutions Proactively identify and reduce issues through design, testing, and implementation of software-based solutions Collaborate with Engineering and Program Management partners to translate customer, business, and technical requirements into architectural designs and feature releases Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability Work within a highly skilled team of engineers to deliver revolutionary improvements to the system and scale them Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.Industry leading healthcareEducational resourcesDiscounts on products and servicesSavings and investmentsMaternity and paternity leaveGenerous time awayGiving programsOpportunities to network and connect

  • Hyderabad, Telangana, India Microsoft Full time

    About the RoleWe are seeking a talented Senior Site Reliability Engineer to join our Cloud Infrastructure Health team at Microsoft. As a key member of our team, you will be responsible for designing, developing, and delivering software solutions that reduce operational burden and improve the reliability of our cloud infrastructure.ResponsibilitiesDesign and...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCP Location: Hyderabad (Work from Office only) Job Type: Full Time About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains:...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, Telangana, India UnitedHealth Group Full time

    At UnitedHealth Group, we're committed to helping people live healthier lives and making the health system work better for everyone. As a Senior Site Reliability Engineer, you'll play a critical role in ensuring the reliability and performance of our cloud-based systems. Your expertise will help us deliver high-quality care to millions of people around the...


  • Hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-On People Leader Responsibility Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCP Location: Hyderabad (Work from Office only) Job Type: Full Time About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains:...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCP Location: Hyderabad (Work from Office only) Job Type: Full Time About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User...


  • hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-OnPeople Leader ResponsibilityPosition will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together. What we are looking for Role: Site Reliability Engineer Experience Range: 8 – 12 Years Location: Bangalore/Hyderbad Must Have: Core Java(must) AND Exposure to...


  • Hyderabad, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together. What we are looking for Role: Site Reliability Engineer Experience Range: 8 – 12 Years Location: Bangalore/Hyderbad Must Have: Core Java(must) AND Exposure to Unix...


  • hyderabad, India Tata Consultancy Services Full time

    TCS has been a great pioneer in feeding the fire of young techies like you. We are a global leader in the technology arena and there’s nothing that can stop us from growing together. What we are looking for Role: Site Reliability Engineer Experience Range: 8 – 12 Years Location: Bangalore/Hyderbad Must Have: - Core Java(must) AND - Exposure to Unix...


  • Hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-On People Leader Responsibility Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, Telangana, India Live Connections Full time

    We are looking for Manager Site Reliability Engineer in Hyderabad locationRoles and Responsibilities :Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, India Quest Diagnostics Full time

    Please Note: This is a Leadership Role with Technically Hand-On People Leader Responsibility Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, Dev Sec Ops Engineers, and others These individuals will vary from entry level to senior...


  • Hyderabad, India Live Connections Full time

    We are looking for Manager Site Reliability Engineer in Hyderabad locationRoles and Responsibilities :Position will manage 5 to 10 engineers both directly and indirectly. The engineers will include Site Reliability Engineers, Observability Engineers, Performance Engineers, DevSecOps Engineers, and others These individuals will vary from entry level to senior...