GPU Compute Cluster SRE

1 week ago


Bengaluru, Karnataka, India NVIDIA Full time

We are seeking a highly skilled GPU Compute Cluster SRE to join our team at NVIDIA. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale production systems with high efficiency and availability.

Farm GPU compute cluster SRE works to maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline that demands knowledge across different systems, Slurm/LSF, Unix administration, scripting, capacity management, and opensource technologies. Farm GPU SRE is responsible for developing the solution around our large compute cluster to make it work efficiently and improve the user experience for customer as well as engineers supporting the cluster.

Key Responsibilities:

  • Design, implement, and support large-scale infrastructure with monitoring, logging, and alerting with promised uptime.
  • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management.
  • Maintain infra and services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.

Requirements:

  • BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics) or equivalent.
  • 3+ years of hands-on industry experience in the above-mentioned areas.
  • Must have experience with Linux system administration (Ubuntu, Centos/Redhat).
  • Must have HPC cluster scheduler experience in setup and administration like SLURM & LSF.
  • Experience in one or more of the following: Python, Perl, Bash.
  • Good understanding of open-source IT Automation tools like Ansible.
  • Interest in crafting, analyzing, and fixing large-scale distributed systems.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.

Preferred Qualifications:

  • Experience of Bright Cluster Manager (BCM).
  • Understanding on InfiniBand or Ethernet concepts.
  • Experience with high-speed storage solutions such as Lustre, GPFS.
  • Experience with MPI, Pytorch.


  • Bengaluru, Karnataka, India NVIDIA Full time

    Job SummaryWe are seeking a highly skilled GPU Compute Cluster SRE to join our team at NVIDIA. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale infrastructure with monitoring, logging, and alerting with promised uptime.Key ResponsibilitiesDesign and implement large-scale...


  • Bengaluru, Karnataka, India NVIDIA Full time

    Job SummaryNVIDIA is seeking a highly skilled GPU Compute Cluster SRE Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale production systems with high efficiency and availability.Key ResponsibilitiesDesign and implement large-scale infrastructure with...


  • Bengaluru, Karnataka, India NVIDIA Full time

    Job Title: GPU Compute Cluster EngineerNVIDIA is seeking a highly skilled GPU Compute Cluster Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and supporting large-scale infrastructure with monitoring, logging, and alerting with promised uptime.Key Responsibilities:Design and...

  • GPU Architect

    2 weeks ago


    Bengaluru, Karnataka, India Synopsys Inc Full time

    Job OverviewThe GPU Architect will lead the design and optimization of GPU-based high-performance computing architectures tailored for OPC software solutions. They will define HPC solutions that effectively utilize current hardware but prepare us for cutting-edge GPU architectures on the horizon that will impact our industry.Key Responsibilities:Design and...

  • GPU Architect

    2 weeks ago


    Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA's Innovation in GPU TechnologyNVIDIA's pioneering work in the GPU has revolutionized modern computer graphics and parallel computing. As a key player in the AI computing landscape, we're seeking a talented individual to join our team and contribute to the advancement of GPU architecture.Key Responsibilities:Collaborate with cross-functional teams to...

  • GPU Architect

    2 weeks ago


    Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA is at the forefront of technological advancement, driving innovation in the field of artificial intelligence and computer graphics. As a GPU Architect, you will play a crucial role in shaping the future of computing.Key Responsibilities:Contribute to the development and enhancement of GPU architecture and simulators, testing infrastructure, metrics,...

  • GPU Architect

    1 week ago


    Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA is a leader in the field of computer graphics and parallel computing. We are seeking a highly skilled GPU Architect to join our team and contribute to the development of our GPU architecture.Key Responsibilities:Contribute to advancing GPU Architecture and Simulators, GPU testing infrastructure, metrics, and/or compilers.Develop and enhance various...

  • GPU Architect

    6 days ago


    Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA is a leader in the field of computer graphics and parallel computing. We're looking for a talented individual to join our team as a GPU Architect. In this role, you'll be responsible for developing and enhancing various features in the GPU architecture.Key Responsibilities:Contribute to advancing GPU Architecture and Simulators, GPU testing...

  • GPU Software Engineer

    2 weeks ago


    Bengaluru, Karnataka, India Synopsys Inc Full time

    Job OverviewWe are seeking a highly skilled GPU Staff/Senior Staff role to optimize and implement GPU-accelerated algorithms for OPC software in the EDA industry.This position emphasizes performance improvements and integration with existing EDA tools, requiring close peer and partner collaborations to deliver solutions at the right time that address the...

  • GPU Architect Lead

    3 days ago


    Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA is seeking a highly skilled GPU Architect Lead to contribute to the advancement of our GPU architecture and simulators. As a key member of our team, you will be responsible for developing and enhancing various features in the GPU architecture. Your expertise in C++, parallel processing, and compiler development will be invaluable in helping us push...

  • GPU Technical Lead

    4 hours ago


    Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA is seeking an experienced GPU expert to drive innovation in our GPU architecture. As a GPU Technical Lead, you will be responsible for advancing various features in the GPU architecture.Key Responsibilities:Contribute to the development and enhancement of GPU architecture and simulators.Design and implement features for future graphics and parallel...


  • Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA GPU Verification Engineer Job DescriptionNVIDIA is seeking a skilled Verification Engineer to verify the design and implementation of the next generation of GPUs. This role offers the opportunity to have a real impact in a technology-focused company that is pushing the frontiers of what is possible today and defining the platform for the future of...


  • Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA is seeking a skilled Verification Engineer to verify the design and implementation of cutting-edge GPUs. This role offers the opportunity to make a real impact in a technology-driven company with a global presence. Our team is passionate about parallel and visual computing, and we're united in our quest to transform the way graphics are used to solve...


  • Bengaluru, Karnataka, India Oracle Full time

    As a Senior Cluster Networking Engineer at Oracle, you will play a key role in designing and operating the network infrastructure required to run distributed AI workloads across a cluster of thousands of GPUs. Our team is responsible for provisioning, securing, and scaling the network stack to meet the needs of our customers. We are looking for adaptable...

  • GPU Engineer

    2 weeks ago


    Bengaluru, Karnataka, India Synopsys Inc Full time

    Job OverviewWe are seeking a highly skilled GPU Staff/Senior Staff role to focus on optimizing and implementing GPU-accelerated algorithms for OPC software in the EDA industry.This position emphasizes performance improvements and integration with existing EDA tools, requiring close peer and partner collaborations to ensure timely delivery of solutions...


  • Bengaluru, Karnataka, India Intel Full time

    Job Title: GPU Development EngineerJob Summary:We are seeking a talented GPU Development Engineer to join our team at Intel. As a key member of our software team, you will be responsible for developing innovative software solutions to accelerate media and video processing on Intel's graphics architecture.Key Responsibilities:Developing new software solutions...


  • Bengaluru, Karnataka, India NVIDIA Full time

    About NVIDIANVIDIA is a pioneering technology company that has revolutionized the world of computing. With a rich history of innovation, we have consistently pushed the boundaries of what is possible. Our journey began with the invention of the GPU in 1999, which sparked the growth of the PC gaming market and redefined modern computer graphics. Today, we are...


  • Bengaluru, Karnataka, India NVIDIA Full time

    GPU Verification EngineerNVIDIA is seeking a talented Verification Engineer to join our team and contribute to the design and implementation of cutting-edge GPUs. As a key member of our ASIC Verification team, you will play a crucial role in verifying the correctness of our industry-leading GPUs.Key Responsibilities:Verify the design and implementation of...


  • Bengaluru, Karnataka, India NVIDIA Full time

    GPU Performance Analysis ExpertNVIDIA is a pioneer in the field of visual processing, high-performance computing, and artificial intelligence. We are seeking a highly motivated and creative engineer to join our HW architecture team, where you will work on projects that will shape the future of visual computing, automotive, and GPU systems.Key...


  • Bengaluru, Karnataka, India NVIDIA Full time

    NVIDIA is seeking a skilled Verification Engineer to verify the design and implementation of the next generation of industry-leading GPUs. This position offers the opportunity to have a real impact in a multifaceted, technology-focused company impacting product lines ranging from consumer graphics to self-driving cars and the growing field of artificial...