HPC / Cuda Software Engineer

1 month ago


Chennai, India KLA Full time

Job Description

KLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation KLA tools.

Your Day-to-day Roles


  • Expose limitations in existing solutions, based on clusters of CPUs & GPUs, to deploy AI-based solutions on on-prem & cloud infrastructures at scale.
  • Develop system-level solutions that enable scaling out image processing & AI loads from single GPU to multi-node clusters with multiple GPUs.
  • Install, benchmark, and evaluate pre-release hardware for early-stage evaluation and prototyping by identifying (or developing) relevant workloads.
  • Explore modern HPC systems software (such as new distributions of linux) for adoption into KLA’s tools.


Minimum Qualifications


  • Masters / PhD in Computer Science or related fields; bachelors degree holders with relevant experience and extraordinary track-record will also be considered.
  • Deep understanding of operating systems, computer networks, and high performance applications
  • Good mental model of the architecture of a modern distributed systems that is comprised of CPUs, GPUs, and accelerators.
  • Experience with deployments of deep-learning frameworks based on TensorFlow, and PyTorch on large-scale on-prem or cloud infrastructures.
  • Solid understanding of container infrastructure such as Docker or singularity, and Kubernetes.
  • Strong Scripting Skills in Bash, Python, or similar.
  • Good communication.


Things to Make us go Wow


  • Hands-on experience in architecting, building, and maintaining (against all odds) large scale distributed HPC clusters.
  • Experience with model development on DL frameworks such as TensorFlow, and PyTorch
  • Experience with building open-source operating systems and software stack on pre-release hardware.
  • Hands-on involvement with cluster management tools (such as Prometheus, Grafana), scheduling and resource management (like SLURM, PBS, MPI/OSHMEM), and virtualization technologies (such as KVM/VMWare/Nutanix)
  • Experience in working with developers who use clusters & sys-admins who maintain clusters


  • Chennai, India KLA Full time

    Job DescriptionKLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation...


  • Chennai, India KLA Full time

    Job Description KLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for...


  • Chennai, India KLA Full time

    Job DescriptionKLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation...


  • Chennai, India KLA Full time

    Job DescriptionKLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation...

  • KLA | HPC

    1 month ago


    chennai, India KLA Full time

    Job DescriptionKLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation...

  • KLA | HPC

    1 month ago


    chennai, India KLA Full time

    Job Description KLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation...

  • KLA | HPC

    1 month ago


    chennai, India KLA Full time

    Job Description KLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation...


  • Chennai, India KLA Full time

    KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA...


  • Chennai, India KLA Full time

    KLA Overview:KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • Chennai, India 3110 K-T India Full time

    Description Architect and Design High-Performance Compute Clusters : Collaborate with cross-functional teams to design, implement, and support HPC clusters. Optimize compute resources for maximum efficiency, considering CPU/GPU architecture, storage scalability, and high-bandwidth interconnects. Project Specifications and Timelines : Understand...


  • Chennai, India KLA Full time

    KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • Chennai, India KLA Full time

    KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • Chennai, India KLA Full time

    KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • Chennai, India KLA Full time

    KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA...


  • Chennai, India KLA Full time

    KLA Overview:KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • chennai, India KLA Full time

    KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • chennai, India KLA Full time

    KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • Chennai, India KLA Full time

    KLA Overview:KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents...


  • Chennai, Tamil Nadu, India KLA Full time

    About the Role:We are seeking a highly skilled Senior HPC System Software Engineer to join our team in India. This is an exceptional opportunity to be at the forefront of developing cutting-edge system software that powers AI advancements.Job Description:The ideal candidate will possess strong object-oriented programming skills in Java and/or C++ and...

  • HPC Admin

    4 weeks ago


    Chennai, India ScaleneWorks Full time

    Assist in cloud engineering projects and tasks, contributing to project success. • Collaborate with team members to deploy, maintain, and optimize cloud solutions. • Provide technical support, troubleshoot issues, and document solutions. • Contribute to the creation of technical documentation and knowledge sharing. • Participate in cloud training and...