Senior Site Reliability Engineer

3 weeks ago


Bengaluru, India NetApp Full time

Title: Senior Site Reliability Engineer

Location:

Bangalore, Karnataka, IN, 560071

Requisition ID: 126263

Job Summary

As a Cloud Infrastructure/Site Reliability Engineer, you will operate at the intersection of development and operations. Your role will involve engaging in and enhancing the lifecycle of cloud services - from design through deployment, operation, and refinement. You will maintain these services by measuring and monitoring their availability, latency, and overall system health. 
You will play a crucial role in sustainably scaling systems through automation and driving changes that improve reliability and velocity. As part of your responsibilities, you will administer cloud-based environments that support our SaaS/IaaS offerings, implemented on a microservices, container-based architecture (Kubernetes).
In addition, you will oversee a portfolio of customer-centric cloud services (SaaS/IaaS), ensuring their overall availability, performance, and security. You will work closely with both NetApp and cloud service provider teams, including those from Google, located across the globe in regions.
Due to the critical nature of the services we support, this position involves participation in a rotation-based on-call schedule as part of our global team. This role offers the opportunity to work in a dynamic, global environment, ensuring the smooth operation of vital cloud services. To be successful in this role, you should be a motivated self-starter and self-learner, possess strong problem-solving skills, and be someone who embraces challenges.

Job Requirements

Incident Response and Troubleshooting: Address and perform root cause analysis (RCA) of complex live production incidents and cross-platform issues involving OS, Networking, and Database in cloud-based SaaS/IaaS environments. Implement SRE best practices for effective resolution. Analysis, and Infrastructure Maintenance: Continuously monitor, analyze, and measure system health, availability, and latency using tools like Prometheus, Stackdriver, ElasticSearch, Grafana, and SolarWinds. Develop strategies to enhance system and application performance, availability, and reliability. In addition, maintain and monitor the deployment and orchestration of servers, docker containers, databases, and general backend infrastructure. Document system knowledge as you acquire it, create runbooks, and ensure critical system information is readily accessible. Security Management: Stay updated with security protocols and proactively identify, diagnose, and resolve complex security issues. Automation and Efficiency: Identify tasks and areas where automation can be applied to achieve time efficiencies and risk reduction. Develop software for deployment automation, packaging, and monitoring visibility. Issue Tracking and Resolution: Use Atlassian Jira, Google Buganizer, and Google IRM to track and resolve issues based on their priority. Team Collaboration and Influence: Work in tandem with other Cloud Infrastructure Engineers and developers to ensure maximum performance, reliability, and automation of our deployments and infrastructure. Additionally, consult and influence developers on new feature development and software architecture to ensure scalability. Debugging, Troubleshooting, and Advanced Support: Undertake debugging and troubleshooting of service bottlenecks throughout the entire software stack. Additionally, provide advanced tier 2 and 3 support for NetApp's Cloud Data Services solutions. Directly influence the decisions and outcomes related to solution implementation: measure and monitor availability, latency, and overall system health. Proficiency in Linux/Unix and CORE OS. Demonstrated experience in scripting and infrastructure automation using tools such as Ansible, Python, Go or Ruby. Deep working knowledge of Containers, Kubernetes, and Serverless computing implementation.

Education

A minimum of 8 - 12 years of experience is required.  A Bachelor of Science Degree in Computer Science, a master’s degree; or equivalent experience is required. 


Job Segment: Cloud, Computer Science, Software Engineer, Engineer, Linux, Technology, Engineering



  • Bengaluru, India nference Full time

    Senior Site Reliability Engineer (SRE)Job Location: BangaloreWork Mode: Hybrid (3 days in the office, 2 days remote)As a Senior Site Reliability Engineer (SRE) at Nference, you will ensure the reliability, scalability, and performance of our nSights platform. Collaborate closely with engineering teams to design, build, and maintain systems supporting our...


  • Bengaluru, India nference Full time

    Senior Site Reliability Engineer (SRE)Job Location: BangaloreWork Mode: Hybrid (3 days in the office, 2 days remote)As a Senior Site Reliability Engineer (SRE) at Nference, you will ensure the reliability, scalability, and performance of our nSights platform. Collaborate closely with engineering teams to design, build, and maintain systems supporting our...


  • Bengaluru, India nference Full time

    Senior Site Reliability Engineer (SRE) Job Location: Bangalore Work Mode: Hybrid (3 days in the office, 2 days remote) As a Senior Site Reliability Engineer (SRE) at Nference, you will ensure the reliability, scalability, and performance of our nSights platform. Collaborate closely with engineering teams to design, build, and maintain systems supporting...


  • Bengaluru, India nference Full time

    Senior Site Reliability Engineer (SRE)Job Location: BangaloreWork Mode: Hybrid (3 days in the office, 2 days remote)As a Senior Site Reliability Engineer (SRE) at Nference, you will ensure the reliability, scalability, and performance of our nSights platform. Collaborate closely with engineering teams to design, build, and maintain systems supporting our...


  • Bengaluru, India Qlik Full time

    Description What makes us Qlik? A Gartner Magic Quadrant Leader for 13 years in a row, Qlik transforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We...


  • Bengaluru, India Barracuda Full time

    Job ID: 25-251Come Join Our Passionate Team! At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data and applications with innovative solutions that grow and adapt with our customers’ journey. More...


  • Bengaluru, India Okta, Inc. Full time

    Get to know Okta Okta is The World’s Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security...


  • Bengaluru, India Oracle Full time

    Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team will focus on product development and product strategy for Oracle Health while building out a complete platform supporting modernized, automated healthcare. This is a net new line of business, constructed with an entrepreneurial...


  • Bengaluru, India Barracuda Full time

    Job ID 25-281 Come Join Our Passionate Team! At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data, and applications with innovative solutions that grow and adapt with our customers’ journey. More...


  • Bengaluru, India Barracuda Full time

    Job ID: 25-251Come Join Our Passionate Team! At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data and applications with innovative solutions that grow and adapt with our customers’ journey. More...


  • Bengaluru, India We IT Global AB Full time

    This is a remoteposition.Join Our Team as a Senior SREEngineerAre you a seasonedSenior Site Reliability Engineer with robust handson experience inGCP & Azure coupled with a strong background in datamanagement Look nofurther!KeyResponsibilities: Collaborate closely with our team to providetechnical guidance andleadership.Conduct regular followups and lead by...


  • Bengaluru, India Mimecast Full time

    Senior Site Reliability Engineering (SRE) (Cloud and Containerization) –PlatformDevops TeamThe driving force behind Platform Devops Team at MimecastDive into Platform DevOps team to drive efficiency and excellence across our platforms. Our team collaborates with engineering teams to expedite end-to-end delivery lifecycles and streamline workload migrations...


  • Bengaluru, India Oracle Full time

    Title: Senior Database Site Reliability Engineer Job Description :  Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Applications & Infrastructure. This team will focus on product development and product strategy for Oracle Health while building out a complete platform supporting modernized, automated healthcare....


  • Bengaluru, India Mimecast Full time

    Senior Site Reliability Engineer – Data RetentionThe driving force behind our award-winning Data Retention platform at MimecastDive into the forefront of innovation with our Data Retention engineering team, taking on the crucial Operations role to help us develop operational aspects of our archiving and security software and its associated...


  • Bengaluru, India Cricbuzz.com Full time

    Site Reliability Engineer We are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services. Experience - 3 - 5 years Responsibilities: ●...


  • Bengaluru, India Cricbuzz.com Full time

    Site Reliability EngineerWe are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services.Experience - 4 - 5 yearsResponsibilities:● Design,...


  • Bengaluru, India Autodesk Full time

    Position Overview Want to help make a better world? As a Senior Site Reliability Engineer (SRE) Autodesk you can do just that. How is this possible? As a member of the team responsible for operating critical customer facing services. You will have the opportunity to contribute to and drive improvements in the operation of mission critical components...


  • Bengaluru, India nference Full time

    Staff/Senior Staff- Site Reliability Engineer (SRE)As a Staff Site Reliability Engineer (SRE) at Nference, you will ensure the reliability, scalability, and performance of our nSights platform. Collaborate closely with engineering teams to design, build, and maintain systems supporting our global partners and customers.Key Responsibilities:System Design and...


  • Bengaluru, India Qure.ai Full time

    About the jobJob Title:Site Reliability EngineerDepartment:EngineeringLocation:BangaloreYears of experience:2-5 yearsType:Full Time EmploymentAbout Qure.ai:Qure.ai is one of the fastest-growing startups in India, which develops Artificial Intelligence enabled products and platforms for healthcare diagnostics. We create cutting-edge solutions that positively...


  • Bengaluru, India Wow Labz Full time

    Role Description :As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, availability, and performance of our systems and services. You will work closely with our engineering and operations teams to design, build, and maintain scalable and reliable infrastructure. Your role will also involve automating processes,...