Senior Site Reliability Engineer

6 months ago


Hyderabad, India Microsoft Full time

Overview

Do you have a passion for high scale services and working with some of Microsoft’s most critical cloud capabilities? We’re looking for a Senior Site Relability Engineer with the right mix of software development, Cloud experience and passion for quality to envision, design, and deliver solutions for Microsoft's cloud Infrastructure.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.Are you looking to be at the forefront of Microsoft’s cloud computing transformation? Are you looking to work in an agile environment that ships frequently while maintaining a focus on long-term bets? Do you want to work with state of the art distributed systems that deal with near real time detections on petabyte scale telemetry using Machine Learning and traditional software to deliver on Cloud Availability and Safety goals. Do you want to make an impact in a team of talented engineers delivering world class Software solutions?

Microsoft Cloud Operations & Innovation (CO+I) is the engine that powers Microsoft cloud services through the operation of our unified global datacenters enabling ~30% of Microsoft revenue through Commercial Cloud ($38 billion in FY20 Q1). The Cloud Infrastructure Health team in CO+IE is focused on improving Cusomer Availability, Data center Safety, Capacity and helping optimize the utilization of Datacenter resources using telemetry and Insights. Our systems analyze petabyte scale telemetry data from Datacenter critical environments and secondary signals in near real time and offline that enable timesensitive insights directly impacting Cloud Operations.Our team is looking for an experienced, competent, and motivated Senior SRE . The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our customers and users.

SRE participates in the service design aspects of Cloud Infrastructure Health system and takes primary responsibility for developing code, scripts, systems, and/or tools that reduce operational burden by automating complex and repetitive tasks such as onboarding of system capabilities to newer data centers and upkeep of system capabilities in the existing sites . The SRE enables feature teams to increase the velocity at which they can safely deploy changes to production, and monitor the effects of changes across the footprint. SRE analyzes telemetry data to develop capacity planning models, identify patterns and trends that drive continuous improvement, and highlight opportunities to deploy automation to monitor and manage CIH services across sites. SRE also participates in on-call rotations to resolve live site incidents, minimize customer impact, and document solutions and insights that inform ongoing improvements to infrastructure, code, tools, and/or processes that prevent the recurrence of similar issues.

Qualifications

Required/Minimum Qualifications:

6+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications:

7+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
OR Doctorate Degree in Computer Science, Information Technology, or related field.
Familiarity with one or more general purpose programming languages including but not limited to: Java, C/C++, C#, Python, JavaScript, PowerShell
Experience with the Microsoft cloud and/or stack including:
O365, Azure, Windows or other Microsoft software/service
Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns
Demonstrated ability to debug, fix, and optimize code
Full-stack troubleshooting skills across network, application, hardware, management fabric, and distributed services layers
Excellent communications skills, both verbal and written

#COIcareers

#COIEngCareers

#COIE_DIODEcareers

Responsibilities

Design, develop, and deliver the required software engineering that reduce operational burden by automating complex and repetitive tasks such as onboarding of system capabilities to newer data centers and upkeep of system capabilities in the existing sites Own deployment, availability, reliability, performance and customer escalation targets for Critical Environment Telemetry solutions Proactively identify and reduce issues through design, testing, and implementation of software-based solutions Collaborate with Engineering and Program Management partners to translate customer, business, and technical requirements into architectural designs and feature releases Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability Work within a highly skilled team of engineers to deliver revolutionary improvements to the system and scale them Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.Industry leading healthcareEducational resourcesDiscounts on products and servicesSavings and investmentsMaternity and paternity leaveGenerous time awayGiving programsOpportunities to network and connect

  • hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2 to 10 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1 GCP EXPERINCE IS MUST Experience: 2 to 6 years The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1GCP EXPERINCE IS MUSTExperience: 2 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing...


  • hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1 GCP EXPERINCE IS MUST Experience: 2 to 6 years The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2.5 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1GCP EXPERINCE IS MUSTExperience: 2 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1GCP EXPERINCE IS MUSTExperience: 2 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1 Experience: 2.5 to 6 years The Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2.5 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1Experience: 2 to 10 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing their skills in system...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1GCP EXPERINCE IS MUSTExperience: 2 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing...


  • Hyderabad, India SID Global Solutions Full time

    Job Description: Site Reliability Engineer (SRE) – Apigee Level 1GCP EXPERINCE IS MUSTExperience: 2 to 6 yearsThe Site Reliability Engineer (SRE) Level 1 will be responsible for maintaining and improving the reliability, availability, and performance of the systems. This entry-level role is ideal for someone who passionate about learning and developing...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCP Location: Hyderabad (Work from Office only) Job Type: Full Time About SIDGS: SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains:...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, India SID Global Solutions Full time

    Job Role: Site Reliability Engineer (SRE) – GCPLocation: Hyderabad (Work from Office only)Job Type: Full TimeAbout SIDGS:SIDGS is a premium global systems integrator and global implementation partner of Google corporation, providing Digital Solutions & Services to Fortune 500 companies. Our Digital solutions go across following domains: User Experience,...


  • Hyderabad, India GeekBull Consulting Full time

    Job Code: GBC-2411129 Job Role: Senior Site Reliability EngineerJob Type : Contract - to - Hire ( C2H )Duration : 6 MonthsExperience: 7 - 10 YearsLocation: HyderabadWork Location : Hyderabad/ RemoteShift Timings : 6 PM to 3 AM ISTAbout Company:We collaborate with a wide range of clients, from startups to industry giants in sectors like...


  • Hyderabad, India GeekBull Consulting Full time

    Job Code: GBC-2411129Job Role: Senior Site Reliability EngineerJob Type : Contract - to - Hire ( C2 H )Duration : 6 MonthsExperience: 7 - 10 YearsLocation: HyderabadWork Location : Hyderabad/ RemoteShift Timings : 6 PM to 3 AM ISTAbout Company:We collaborate with a wide range of clients, from startups to industry giants in sectors like Healthcare,...


  • Hyderabad, India GeekBull Consulting Full time

    Job Code: GBC-2411129Job Role: Senior Site Reliability EngineerJob Type : Contract - to - Hire ( C2H )Duration : 6 MonthsExperience: 7 - 10 YearsLocation: HyderabadWork Location : Hyderabad/ RemoteShift Timings : 6 PM to 3 AM ISTAbout Company:We collaborate with a wide range of clients, from startups to industry giants in sectors like Healthcare,...


  • hyderabad, India FedEx ACC Full time

    A Site Reliability Engineer (SRE) is an advanced DevOps role that combines software engineering and Cloud capabilities to ensure the scalability, performance, and reliability of large-scale, cloud-based applications. As applications and infrastructure became complex and cloud-based—a more proactive and software-centric approach is needed to ensure...


  • Hyderabad, Telangana, India GeekBull Consulting Full time

    We are seeking a highly skilled Senior Site Reliability Engineer to join our team at GeekBull Consulting in Hyderabad. This is a Contract-to-Hire (C2H) opportunity with a duration of 6 months.About the RoleAs a Senior Site Reliability Engineer, you will be responsible for designing, developing, and maintaining infrastructure through popular Infrastructure as...