Hpc l3

3 weeks ago


Panvel, India Yotta Data Services Private Limited Full time

Job Scope: As an HPC Admin , you will be responsible for the management and maintenance of GPU Supercomputing clusters on NVIDIA reference architecture. You will ensure optimal performance and uptime of these critical systems, supporting high-performance computing (HPC) requirements. Job Responsibilities: Configure, and maintain GPU Supercomputing clusters and associated networking configuration. Implement and optimize software stacks including Maa S (metal-as-a-service), Job Scheduler (SLURM/PBS), Cloud Orchestration (Kubernetes), and Network Management (Net Q for Ethernet fabric and UFM for Infini Band). Conduct performance activities such as debugging, profiling, benchmarking, and tuning of GPU applications on large-scale supercomputing clusters. Run benchmarking applications from widely used platforms such as MLPerf Training & Inference, AI Training (Py Torch, Tensor Flow, Ne Mo, Megatron-LM), and AI Inference (Tensor RT-LLM, Triton Inference Server, v LLM). Must-Have Skill: Hands-on experience with NVIDIA GPU, particularly NVIDIA Data Centre GPUs (A100/H100) Experience in provisioning and managing software stacks like Maa S, Job Scheduler (SLURM/PBS), Cloud Orchestration (Kubernetes), and Network Management (Net Q for Ethernet fabric and UFM for Infini Band). Prior experience collaborating with NVIDIA Solution Architect & Engineering teams on large-scale GPU-as-a-service projects. Familiarity with benchmarking applications from widely used platforms and frameworks, including MLPerf, Py Torch, Tensor Flow, Ne Mo, Megatron-LM, Tensor RT-LLM, Triton Inference Server, and v LLM. Experience in performance engineering, including debugging, profiling, benchmarking, and tuning various GPU applications on large-scale supercomputing clusters. Good to Have Skill: Knowledge of other HPC technologies and architectures beyond NVIDIA, broadening expertise in the field. Experience with other cloud platforms and orchestration tools, expanding versatility in deployment environments. Strong problem-solving and troubleshooting abilities, enabling quick resolution of complex technical issues. Excellent communication and collaboration skills to work effectively within cross-functional teams and with external partners. Behavioral Attributes: Strong problem-solving skills with a proactive and solution-oriented approach. Excellent communication and collaboration skills for effective customer support. Adaptability to handle a dynamic and fast-paced cloud administration environment. Commitment to security best practices and continuous improvement. Qualification and Experience: Bachelor’s degree in engineering, or equivalent. Minimum 5 + years’ experience in IT, 5+ years of relevant experience in HPC engineering roles, with a focus on NVIDIA GPU and Networking Technologies. Demonstrated success in deploying and managing large-scale GPU Supercomputing clusters, preferably in collaboration with NVIDIA teams. Proven track record of performance engineering activities and optimizing GPU applications for high-performance computing workloads.


  • HPC L3

    3 weeks ago


    Panvel, India Yotta Data Services Private Limited Full time

    Job Scope: As an HPC Admin , you will be responsible for the management and maintenance of GPU Supercomputing clusters on NVIDIA reference architecture. You will ensure optimal performance and uptime of these critical systems, supporting high-performance computing (HPC) requirements. Job Responsibilities: - Configure, and maintain GPU Supercomputing...

  • HPC L3

    2 weeks ago


    Panvel, India Yotta Data Services Private Limited Full time

    Job Scope:As an HPC Admin , you will be responsible for the management and maintenance of GPU Supercomputing clusters on NVIDIA reference architecture. You will ensure optimal performance and uptime of these critical systems, supporting high-performance computing (HPC) requirements.Job Responsibilities:- Configure, and maintain GPU Supercomputing clusters...

  • hpc l3

    3 weeks ago


    Panvel, India Yotta Data Services Private Limited Full time

    Job Scope: As an HPC Admin , you will be responsible for the management and maintenance of GPU Supercomputing clusters on NVIDIA reference architecture. You will ensure optimal performance and uptime of these critical systems, supporting high-performance computing (HPC) requirements. Job Responsibilities: Configure, and maintain GPU Supercomputing clusters...

  • hpc l3

    3 days ago


    Panvel, Maharashtra, India Yotta Data Services Private Limited Full time ₹ 8,00,000 - ₹ 20,00,000 per year

    Job Scope:As an HPC Admin , you will be responsible for the management and maintenance of GPU Supercomputing clusters on NVIDIA reference architecture. You will ensure optimal performance and uptime of these critical systems, supporting high-performance computing (HPC) requirements.Job Responsibilities:Configure, and maintain GPU Supercomputing clusters and...

  • HPC L3

    3 weeks ago


    Panvel, India Yotta Data Services Private Limited Full time

    Job Scope: As an HPC Admin , you will be responsible for the management and maintenance of GPU Supercomputing clusters on NVIDIA reference architecture. You will ensure optimal performance and uptime of these critical systems, supporting high-performance computing (HPC) requirements. Job Responsibilities: Configure, and maintain GPU Supercomputing clusters...

  • HPC L3

    3 weeks ago


    Panvel, India Yotta Data Services Private Limited Full time

    Job Scope:As an HPC Admin , you will be responsible for the management and maintenance of GPU Supercomputing clusters on NVIDIA reference architecture. You will ensure optimal performance and uptime of these critical systems, supporting high-performance computing (HPC) requirements.Job Responsibilities:Configure, and maintain GPU Supercomputing clusters and...