Sr. Site Reliability Engineer

7 days ago


Hyderabad, Telangana, India Microsoft Full time
Overview


Are you interested in working for one of the most exciting products at Microsoft, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Are you interested in a start-up like the environment, passionate about cloud computing technology and driving growth in one of Microsoft's core businesses? If so, then look no further than the Azure Customer Experience (CXP), Customer Reliability Engineering (CRE) Team Microsoft Azure provides customers with on-demand and infinitely scalable infrastructure and platform for customers to build, host, and scale service applications on the Internet through Microsoft's global data centers.

Azure CXP CRE is a top-level pillar of Azure Engineering that leads to world-class customer reliability engagements, modern customer-first experiences for scale, and drives deep customer insights and empathy into the broader Azure Engineering organization.

Our team prioritizes customer feedback to enhance Azure services, support, incident management, and community interactions. Our commitment to no dead-ends guarantees that all customers can maximize their potential with the Microsoft Cloud.

Our Organization is looking for passionate, self-motivated Customer Reliability Engineer with extensive experience in designing, developing , implementing, debugging and troubleshooting cloud infrastructure and monitoring solutions.

As a key member of our team, you will play a critical role in ensuring the reliability, availability, and performance of Synthetic infrastructure hosted in Microsoft Azure.

You will be responsible for designing, implementing, and maintaining robust Synthetic workload and monitoring its systems to track and meet the service level objectives defined in our offerings to internal consumers.

You will be accountable to improve customer experience on Azure, for diagnosing and troubleshooting mission critical customer applications built on the Microsoft Azure platform.

This position is critical to the success of our team's charter and embodies our inclusive culture, growth & learning mindsets, and unwavering dedication to diversity.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. Azure aspires to be the world's computer.

As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.

Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required Qualifications:
8+ years technical experience in software engineering, network engineering, or systems administration OR bachelor's degree in computer science, Information Technology, or related field AND 4+ year(s) technical experience in software engineering, network engineering, or systems administration OR master's degree in computer science, Information Technology, or related field

Minimum 4+ years of Industry experience as SRE with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.

3+ years demonstrated technical, cross group collaboration experience and proficient communication skills.

Preferred Qualifications:


4+ years of professional software engineering experience designing, building, and running cloud services at large scale including but not limited to C#, C++, or Java.

Familiarity with distributed systems and event driven architecture.
Technical expertise on Azure services and capabilities and/or cloud platforms.
Site Reliability Engineering experience in 24 x 7 x 365 enterprise environments.
Experience with Linux system administration tasks and container orchestration platforms like K8S, AKS.
Experience with high throughput customers facing APIs (REST)
Front-end experience with Angular, HTML/CSS, JavaScript, and/or TypeScript
Strong coding, debugging and problem-solving skills.
Interest in delivering and influencing large transformational projects.
Experience with Devops, CI/CD, Infrastructure as Code (IaC), Monitoring and Logging platforms like Grafana, Prometheus etc.
Able to work efficiently, prioritize workflow, and meet deadlines.
Ability to communicate with a variety of audiences, including high-profile customers, executive management, and engineering teams.
Proven ability to deal with ambiguity and drive for clarity.
#AZCXP, #AZCXPCRE
The ability to meet Microsoft, customer and/or government security screening requirements are required for this role.

These requirements include, but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check:

This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Responsibilities


To be successful in this role, you must have a great track record of customer compassion, an engineering mindset, an innate aptitude for agility, and technical excellence in Site Reliability engineering.

Develop a foundational understanding of distributed systems design, interactions between cloud technology layers and components, basic dependencies at scale, and the code that defines infrastructures.

Develop an understanding of the code, features, and operations of Synthetic infrastructure at scale as required to contribute to incremental improvements in infrastructure availability, reliability, efficiency, observability, and/or performance; participate in on-boarding, code/design reviews, and regular meetings with the engineering teams that develop and/or manage those infrastructure components.

Develop Synthetic workload to improve the observability, reliability, and operability of a defined range of platforms, systems, features with direction from other engineers.

Support ongoing engagements with product engineering teams by participating in code/design reviews, and regular meetings throughout synthetic infrastructure development and operations cycles; draws insights from engagements with product engineering teams and basic analyses of telemetry data to propose potential improvements to code and designs for a defined set of product components or features with guidance from other engineers.

Implement simple configuration and data changes across Synthetic workloads or features with guidance from other engineers to develop an understanding of how configurations, binaries, and data can be managed using code, tooling, and automation at scale.

Uses existing tools to troubleshoot problems or flaws affecting the availability, reliability, performance, and/or efficiency of components or features with guidance from other engineers.

Suggests potential solutions to resolve and prevent recurring issues and brings them to the attention of other engineers or team leaders.

Participate in On-call rotations, including Incident response and mitigation within the infrastructure.

During on call rotations evaluate the impact levels of incidents, resolves basic issues, notifies product teams or owners about substantial customer-affecting concerns, and escalates the resolution of intricate or multi-component/feature issues to other engineers as required.

Communicates incident details and resolutions through post-mortem reports and in regular review meetings.

Develop an understanding of key learnings, insights, and best practices that can be applied to improve system, platform, and/or product development and operations by participating in code/design reviews, incident drills and debriefs, and regular meetings, as well interactions with more experienced Site Reliability Engineers (SREs) and members of product engineering teams.

Collaborate closely with Engineering/PM to ensure the availability and performance of Live Site and the satisfaction of our customers.
Drives continuous improvement in the Azure platform incorporating feedback from internal/external customers.
Is Enthusiastic, self-motivated individual and a great teammate with excellent collaboration, organizational, and time management skills.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

Industry leading healthcareEducational resourcesDiscounts on products and servicesSavings and investmentsMaternity and paternity leaveGenerous time awayGiving programsOpportunities to network and connect

  • Hyderabad, Telangana, India Virtusa Full time

    Site Reliability engineer - CREQ188641 Description Position : SRE Primary skills: devops CI/CD pipeline Location: Hyderabad Should have proficiency in understanding of application monitoring stack(Logs, Events, Metrics and Alerts) and ability to visualize and setup end-to-end observability.Should have proficiency in industry standard monitoring tools...


  • Hyderabad, Telangana, India Microsoft Full time

    Overview Are you interested in working for one of the most exciting products at Microsoft, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Are you interested in a start-up like the environment, passionate about cloud computing technology and driving growth in one of Microsoft's core businesses? If so,...


  • Hyderabad, Telangana, India Korn Ferry Full time

    Role - Site Reliability Engineer Exp - 5+ years Required Location - Hyderabad ( Work from Office-Hybrid) Shift Timings - 5AM -1 PM IST We are looking for a Site Reliability Engineer with strong development background to join our team. In this role, you will be responsible for ensuring the reliability and performance of our systems. You will work closely...


  • Hyderabad, Telangana, India Quiktrak, LLC Full time

    Job Title: Azure Site Reliability Engineer (SRE) / DevOps Engineer Job Description: Summary: As an Azure Site Reliability Engineer (SRE) / DevOps Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure on the Azure platform. This role involves managing deployments, implementing continuous...


  • Hyderabad, Telangana, India ValueLabs Full time

    Experienced in SRE or Site Reliability EngineerDesign, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure DevOps.Collaborate with cross-functional teams to optimize system performance, reliability, and scalability.Develop and maintain tools for continuous integration, continuous deployment (CI/CD), and...


  • Hyderabad, Telangana, India SID Global Solutions Full time

    Job Title: Site Reliability EngineerLocation: Hyderabad - OnsiteWork Mode: 5 Days Working from OfficeJOB DESCRIPTION6-7 years of experience in 24x7 support of enterprise level applicationsGraduate in Computers, Engineering or similar fieldFamiliarity with Kubernetes and container orchestrationKnowledge of Apigee development tools & JavaScriptBasic...


  • Hyderabad, Telangana, India SID Global Solutions Full time

    Job Title: Site Reliability EngineerLocation: Hyderabad - OnsiteWork Mode: 5 Days Working from OfficeJOB DESCRIPTION6-7 years of experience in 24x7 support of enterprise level applicationsGraduate in Computers, Engineering or similar educational qualificationFamiliarity with Kubernetes and container orchestration.Knowledge of Apigee development tools &...


  • Hyderabad, Telangana, India Electronic Arts Full time

    Pogo has been the leader in online casual games since 1998. Featuring a growing library of 60+ titlesspanning popular genres like Solitaire, Mahjong, Match 3, and more, Pogo exists to be the bestdestination for online casual games. We strive to produce high-quality HTML5-powered games withsophisticated metagames and social mechanics, all while working...


  • Hyderabad, Telangana, India Electronic Arts Full time

    Pogo has been the leader in online casual games since 1998. Featuring a growing library of 60+ titlesspanning popular genres like Solitaire, Mahjong, Match 3, and more, Pogo exists to be the bestdestination for online casual games. We strive to produce high-quality HTML5-powered games withsophisticated metagames and social mechanics, all while working...


  • Hyderabad, Telangana, India Microsoft Full time

    Overview Do you have a passion for high scale services and working with some of Microsoft's most critical cloud capabilities? We're looking for a Senior Site Relability Engineer with the right mix of software development, Cloud experience and passion for quality to envision, design, and deliver solutions for Microsoft's cloud Infrastructure. ...


  • Hyderabad, Telangana, India Microsoft Full time

    Overview Do you have a passion for high scale services and working with some of Microsoft's most critical cloud capabilities? We're looking for a Senior Site Relability Engineer with the right mix of software development, Cloud experience and passion for quality to envision, design, and deliver solutions for Microsoft's cloud Infrastructure. Microsoft's...


  • Hyderabad, Telangana, India Swiss Re Full time

    About the team Part of Swiss Re's Group Digital & Technology organisation, Shared Platform Services ​provides central infrastructural services and standardized common components, ranging from automation, secure access tokens, cryptographic keys to database infrastructures. That helps us to increase productivity of our clients, improve cost efficiency, and...


  • Hyderabad, Telangana, India Alter Domus Full time

    ABOUT US We are Alter Domus. Meaning "The Other House" in Latin, Alter Domus is proud to be home to 85% of the top 30 asset managers in the alternatives industry, and more than 5,000 professionals across 23 countries. With a deep understanding of what it takes to succeed in alternatives, we believe in being different. Invest yourself in the alternative,...


  • Hyderabad, Telangana, India Alter Domus Full time

    ABOUT US We are Alter Domus. Meaning "The Other House" in Latin, Alter Domus is proud to be home to 85% of the top 30 asset managers in the alternatives industry, and more than 5,000 professionals across 23 countries. With a deep understanding of what it takes to succeed in alternatives, we believe in being different. Invest yourself in the alternative,...


  • Hyderabad, Telangana, India Alter Domus Full time

    ABOUT USWe are Alter Domus. Meaning "The Other House" in Latin, Alter Domus is proud to be home to 85% of the top 30 asset managers in the alternatives industry, and more than 5,000 professionals across 23 countries.With a deep understanding of what it takes to succeed in alternatives, we believe in being different. Invest yourself in the alternative, and...


  • Hyderabad, Telangana, India NCR Corporation Full time

    About NCR VOYIX NCR VOYIX Corporation (NYSE: VYX) is a leading global provider of digital commerce solutions for the retail, restaurant and banking industries. NCR VOYIX is headquartered in Atlanta, Georgia, with approximately 16,000 employees in 35 countries across the globe. For nearly 140 years, we have been the global leader in consumer transaction...


  • Hyderabad, Telangana, India Maneva Consulting Pvt. Ltd Full time

    GreetingsFromManeva JobDescription JobTitle Site ReliabilityEngineer LocationBangalore/Hyderabad Experience4 10Years JobRequirement: 1.Incident Management: Lead incident management efforts coordinating with crossfunctionalteams to resolve service disruptions and minimize impact oncustomers.Establish incident response processes andprocedures ensuring...


  • Hyderabad, Telangana, India Experian Full time

    Job Description Experian is looking for a talented senior engineer to join our Site Reliability Engineering team. This team is focused on system performance, optimization, and keeping our AWS platform running reliably at scale. The ideal candidate should have an extensive background in implementing highly performant solutions in AWS, be well rounded,...


  • Hyderabad, Telangana, India ServiceNow Full time

    Job RoleResponsibilities:Offer solutions to problems in our infrastructure. Utilize your expertise in software development, systems engineering, and networking to proactively avoid recurring issues. Collaborate with other teams to enhance the reliability and performance of the infrastructure through better system design. Promote an environment where manual...


  • Hyderabad, Telangana, India F5 Full time

    At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers...