HPC Solutions Specialist

19 hours ago


Hyderabad Secunderabad Telangana, India beBeeEngineering Full time ₹ 15,00,000 - ₹ 28,00,000
High-Performance Computing Systems Engineer

We are seeking an experienced professional with 7+ years of expertise in high-performance computing (HPC) environments. This role requires hands-on experience with Python, Kubernetes (K8s), Slurm, OpenStack, and Ansible along with the ability to support external clients in live troubleshooting sessions.

The ideal candidate will have deep technical knowledge of drivers, troubleshooting methods, and system-level debugging and will play a key role in managing, optimizing, and troubleshooting HPC clusters and cloud-based HPC environments.

  • HPC System Administration & Troubleshooting
  • Manage and optimize HPC clusters, ensuring high availability and performance.
  • Troubleshoot GPU, CPU, network drivers, firmware, and OS-level issues.
  • Debug storage, networking, and job scheduling bottlenecks in Slurm-based environments.
  • Kubernetes & Cloud HPC Environments
  • Deploy and manage HPC workloads in Kubernetes for AI/ML and parallel computing.
  • Optimize OpenStack-based HPC clusters with Ceph, Cinder, and Neutron for cloud scalability.
  • Implement containerized HPC workflows using Kubernetes and OpenShift.
  • Automation & Infrastructure as Code (IaC)
  • Develop Ansible and Terraform scripts for provisioning and managing HPC resources.
  • Automate job scheduling, cluster monitoring, and log analysis using Python.
  • Optimize CI/CD pipelines for HPC and AI/ML applications.
  • Performance Tuning & Benchmarking
  • Benchmark and optimize multi-node HPC workloads (MPI, NCCL, ROCm, CUDA).
  • Tune OS parameters, networking (InfiniBand, RoCE), and Slurm configurations for peak performance.
  • Enhance HPC storage performance (Ceph, Lustre, NFS) and distributed computing efficiency.
  • Client Support & Collaboration
  • Provide real-time technical support and troubleshooting for HPC users.
  • Engage with developers, DevOps, and system administrators to optimize cluster performance.
  • Document solutions, best practices, and contribute to internal knowledge bases.
Preferred Qualifications
  • Experience with AMD MI300, MI2X0 GPUs, ROCm, MPI, UCX, or XPMEM.
  • Exposure to containerized workloads using Singularity or Docker in HPC.
  • Familiarity with OpenStack deployment automation (e.g., TripleO, Kolla, or OpenStack-Ansible).
  • Experience in customer-facing technical roles, with a strong ability to troubleshoot live issues.


  • Hyderabad / Secunderabad, Telangana, India beBeeHighPerformanceComputing Full time ₹ 1,50,000 - ₹ 28,00,000

    Our team is seeking a highly skilled HPC Specialist to join our infrastructure services team.This specialist will be responsible for the operational health deployment and lifecycle of HPC technology, ensuring it runs at optimal levels and meets our business needs.The role involves designing innovative solutions using projects and qualification frameworks to...


  • Hyderabad / Secunderabad, Telangana, India beBeeHpc Full time

    Job DescriptionWe are seeking a highly skilled and experienced HPC specialist to join our team. As an HPC specialist, you will be responsible for the operational health deployment and lifecycle of HPC technology.You will design innovative solutions utilizing projects and qualification frameworks to run and optimize complex infrastructures.Additionally, you...

  • Senior HPC Engineer

    2 weeks ago


    Hyderabad / Secunderabad, Telangana, India beBeeHighPerformanceComputingEngineer Full time

    Job Description:This position involves the deployment, maintenance, and support of HPC infrastructure in a multi-cloud environment. The ideal candidate will have hands-on engineering experience with deep technical expertise in HPC technology and standard methodologies.Implement and manage cloud-based infrastructure that supports HPC environments for data...

  • HPC Applications

    4 days ago


    Madhapur, Hyderabad, Telangana, India Locuz Enterprise Solutions Full time ₹ 15,000 - ₹ 28,00,000 per year

    L2 Skill HPC Engineer with Application ExpertiseRole Overview:An L2 HPC (High-Performance Computing) Engineer with an application skillset is responsible for supporting, troubleshooting, and maintaining HPC infrastructure and assisting users with scientific and engineering applications. They operate between infrastructure and application layers, ensuring...


  • Hyderabad / Secunderabad, Telangana, India beBeeHighperformancecomputing Full time ₹ 15,00,000 - ₹ 28,00,000

    Job Title: Senior High Performance Computing EngineerWe are seeking a Senior High-Performance Computing (HPC) professional to deploy, maintain, and support HPC infrastructure in a multi-cloud environment. This hands-on role requires deep technical expertise in HPC technology and is vital for supporting data science, AI/ML workflows, and image analysis.Key...


  • Hyderabad / Secunderabad, Telangana, India beBeeCloud Full time

    Cloud HPC Engineer Position OverviewWe are seeking a seasoned Cloud High-Performance Computing (HPC) Engineer to join our team. This role is responsible for designing, implementing, and managing cloud-based infrastructure that supports HPC environments.As a key member of our engineering team, you will collaborate with data scientists and ML engineers to...


  • Hyderabad / Secunderabad, Telangana, India beBeeHighPerformance Full time ₹ 15,00,000 - ₹ 20,00,000

    System Administrator for High-Performance Computing EnvironmentsAre you an experienced professional in high-performance computing environments? We are seeking a skilled System Administrator to manage, optimize, and troubleshoot HPC clusters and cloud-based environments.This role requires hands-on experience with Python, Kubernetes (K8s), Slurm, OpenStack,...

  • Hpc Applications

    4 days ago


    Madhapur, Hyderabad, Telangana, India Locuz Enterprise Solutions Full time

    **L2 Skill HPC Engineer with Application Expertise** **Role Overview**: **Core Responsibilities**: - ** HPC Cluster Support**: Manage day-to-day operations of HPC clusters (Slurm, PBS, LSF), monitor jobs, and node health, and manage user issues at L2. - ** Application Support & Optimization**: - ** User & Job Management**: Handle user access, and...


  • Hyderabad / Secunderabad, Telangana, India beBeeHPC Full time ₹ 9,00,000 - ₹ 12,00,000

    High-Performance Computing ExpertHPC System Administration & Troubleshooting:Manage and optimize HPC clusters for high availability and performance.Troubleshoot GPU, CPU, network drivers, firmware, and OS-level issues to ensure smooth operation.Debug storage, networking, and job scheduling bottlenecks in Slurm-based environments for optimal...


  • Hyderabad / Secunderabad, Telangana, India beBeeHighperformance Full time ₹ 1,04,000 - ₹ 1,30,878

    Lead High-Performance Computing (HPC) EngineerHPC System Administration & Troubleshooting:Benchmark, optimize and troubleshoot complex HPC systems to ensure high availability, performance and reliability.Solve GPU, CPU, network drivers, firmware and OS-level issues efficiently using technical expertise.Debug storage, networking and job scheduling bottlenecks...