Senior Cloud Reliability Engineer

4 weeks ago


Bengaluru, Karnataka, India NVIDIA Full time

About the Role:

NVIDIA is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for designing, implementing, and supporting large scale Kubernetes clusters with monitoring, logging, and alerting.

Key Responsibilities:

  • Design and implement large scale Kubernetes clusters with monitoring, logging, and alerting.
  • Engage in the whole lifecycle of services, from inception and design, through deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Be part of an on call rotation to support production systems.

Requirements:

  • A minimum of 3 years of hands-on experience in setup, administration, and maintenance of multiple large (100+ nodes) Kubernetes clusters on-prem and Cloud Service Providers like AWS, Azure, GCP, OCI.
  • Strong coding experience in one or more of the following languages: Go, Python, Perl, Java, C, C++, Ruby.
  • Hands-on system administration experience of at least 2 years on large scale UNIX production environments, with validated debugging and troubleshooting skills.
  • Ability to maintain platform SLAs through accurate resolutions.
  • Outstanding teammate who can collaborate and influence in a multifaceted environment.
  • Demonstrable experience in handling algorithms, data structures, complexity analysis, and software design.
  • BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics).

Preferred Qualifications:

  • Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker.
  • Demonstrated ability to automate routine tasks, debug, and optimize existing code.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Hands-on experience on network and storage administration.
  • Unit testing and benchmarking are an integral part of your code.
  • Ability to reason and choose the best possible algorithm to meet scaling and availability challenges.
  • Ability to decompose complex requirements into simple tasks and reuse available solutions to implement most of those.


  • Bengaluru, Karnataka, India Lowe's India Full time

    About Lowe's IndiaLowe's India is a leading provider of innovative technology products and solutions, serving as the backbone for Lowe's customers through omnichannel experiences. With a strong presence in Bengaluru, Lowe's India employs over 4,200 associates across various functions, including technology, analytics, merchandising, supply chain, marketing,...


  • Bengaluru, Karnataka, India SolarWinds Full time

    At SolarWinds, we're dedicated to helping customers achieve business transformation through innovative solutions. Our mission is built on collaboration, accountability, and a passion for innovation.The Role:We're seeking a Senior Cloud Reliability Engineer to join our team and contribute to the development and management of our cloud infrastructure. As a key...


  • Bengaluru, Karnataka, India Qlik Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Qlik. As a Senior Site Reliability Engineer, you will play a crucial role in ensuring the scalability, observability, and reliability of our SaaS environments.Key ResponsibilitiesScale our SaaS architecture to support the needs of our growing user base and...


  • Bengaluru, Karnataka, India Barracuda Full time

    About the RoleWe are seeking a highly skilled and passionate Senior Site Reliability Engineer to join our cross-functional Agile team at Barracuda Central Intelligence. As a key member of our team, you will be responsible for managing production services and collaborating with Engineering and Operations teams to ensure reliability, scalability, and...


  • Bengaluru, Karnataka, India Cloud Software Group Full time

    About This Team:At Cloud Software Group, we're pushing the boundaries of cloud-based solutions. Our Citrix Workspace App team is at the forefront, working on secure delivery of virtual apps to any device, anywhere. As a Senior Software Engineer, you'll be part of a collaborative and customer-focused environment, working well in a team. Expect a fast but...


  • Bengaluru, Karnataka, India Groww Full time

    About UsAt Groww, we are committed to making financial services accessible to every Indian through a multi-product platform. Our team is passionate about creating an exceptional experience for our customers, with a focus on customer obsession and customer-centricity.Job DescriptionWe are seeking an experienced Senior Site Reliability Engineer to join our...


  • Bengaluru, Karnataka, India Cisco Full time

    About the RoleCisco is seeking a talented Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability of our cloud services. You will work closely with our software development, operations, and infrastructure teams to deliver reliable and efficient cloud solutions.Your Key...


  • Bengaluru, Karnataka, India Cloud Software Group Full time

    About Cloud Software GroupCloud Software Group is a leading provider of cloud-based solutions, serving over 1 million users worldwide. We value diverse perspectives, innovation, and the courage to take risks. Our teams are empowered to learn, dream, and build the future of work.About This RoleWe are seeking a Senior Cloud Software Development Engineer to...


  • Bengaluru, Karnataka, India Wipro Full time

    Job Title: Cloud Reliability EngineerAbout the Role:We are seeking a highly skilled Cloud Reliability Engineer to join our team at Wipro. As a Cloud Reliability Engineer, you will be responsible for ensuring the reliability and security of our cloud infrastructure.Key Responsibilities:Design and implement scalable and secure cloud infrastructure using...


  • Bengaluru, Karnataka, India Microsoft Full time

    About the RoleMicrosoft is a company where passionate innovators come to collaborate, envision what can be, and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.The Azure Data engineering team is leading the transformation of analytics in the world of...


  • Bengaluru, Karnataka, India NVIDIA Full time

    Job SummaryNVIDIA is seeking a highly skilled Senior Cloud Reliability Engineer to join our team. As a key member of our Site Reliability Engineering (SRE) team, you will be responsible for designing, implementing, and supporting large scale Kubernetes clusters with monitoring, logging, and alerting.Key ResponsibilitiesDesign and implement large scale...


  • Bengaluru, Karnataka, India Pearson Full time

    The Senior Cloud Reliability Engineer plays a pivotal role in ensuring the security, reliability, and performance of Pearson's critical services. This position demands a versatile professional who can contribute across development, system operations, resiliency testing, security hardening, and performance engineering.Key responsibilities include:Addressing...


  • Bengaluru, Karnataka, India Microsoft Full time

    OverviewMicrosoft is a company where passionate innovators come to collaborate, envision what can be, and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.Our MissionThe Azure Data engineering team is leading the transformation of analytics in the...


  • Bengaluru, Karnataka, India Sumo Logic Full time

    Job SummarySumo Logic is seeking a highly skilled Senior Site Reliability Engineer to join our global team. As a key member of our SRE team, you will be responsible for ensuring the sustained operational excellence of our planet-scale observability and security products.ResponsibilitiesDesign and implement scalable systems and architectures to support our...


  • Bengaluru, Karnataka, India WELLS FARGO BANK Full time

    Job Title: Cloud Reliability EngineerAbout Wells Fargo BankWe are a leading financial services company, dedicated to helping our customers achieve their financial goals. Our commitment to innovation and customer satisfaction has made us one of the largest banks in the United States.Job Summary:We are seeking a highly skilled Cloud Reliability Engineer to...


  • Bengaluru, Karnataka, India Guidewire Full time

    Resilient Infrastructure EngineerWe are looking for a Cloud Reliability Engineer to join our team at Guidewire. As a Cloud Reliability Engineer, you will be responsible for ensuring the availability, scalability, and security of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement highly available and scalable cloud infrastructureDevelop...


  • Bengaluru, Karnataka, India Wayfair Full time

    About the RoleWe're seeking a skilled Cloud Reliability Advocate to join our Reliability Engineering team at Wayfair. As a key member of our team, you'll play a vital role in ensuring the reliability, scalability, and performance of our cloud-based platforms.Main Responsibilities:Design and implement scalable and reliable cloud-based systems, leveraging...


  • Bengaluru, Karnataka, India Groww Full time

    About GrowwWe are a passionate group of people focused on making financial services accessible to every Indian through a multi-product platform. Our mission is to empower individuals to take control of their financial journey. We strive to create a culture that fosters innovation, collaboration, and customer-centricity.Our VisionOur long-term vision is to...


  • Bengaluru, Karnataka, India BlackLine Full time

    Job DescriptionAt BlackLine, we're committed to bringing passion and customer focus to the business of enterprise applications. We're seeking an experienced Senior Site Reliability Engineer to lead the development of new Cloud native tools and services with extensive experience with deploying and monitoring Infrastructure as code (IAC).Key...


  • Bengaluru, Karnataka, India Lowe's India Full time

    About Lowe'sLowe's Companies, Inc. is a Fortune 50 home improvement company serving approximately 16 million customer transactions a week in the United States.With total fiscal year 2023 sales of more than $86 billion, Lowe's operates over 1,700 home improvement stores and employs approximately 300,000 associates.Lowe's India develops innovative technology...