BitOoda | Systems/Network Engineer – High-Performance Compute GPU Infrastructure

2 days ago


india BitOoda Full time

Role Overview

As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require on-site support for hardware installations or emergency maintenance.


Key Responsibilities:

System Optimization

  • Configure and optimize bare-metal servers, including Linux OS, NVIDIA/AMD GPU drivers, and system libraries.
  • Fine-tune NUMA settings, CPU-GPU affinity, and storage I/O for peak performance.
  • Benchmark and tune HPC systems for specific workloads, ensuring sustained high performance.

GPU Cluster Management

  • Deploy and manage GPU clusters using job orchestration tools like Kubernetes, Slurm, or similar platforms.
  • Monitor GPU utilization, thermals, and overall system health using tools like NVIDIA DCGM, ROCm, and Prometheus/Grafana.

Networking

  • Design and maintain high-speed networking solutions (e.g., NVLink, InfiniBand, RDMA) for distributed GPU systems.
  • Optimize data transfer between nodes and reduce latency in cluster communication.

Storage Solutions

  • Manage and configure storage solutions such as NVMe, SSD arrays, Ceph, or Lustre for high-throughput workloads.

Automation

  • Automate system deployment, updates, and monitoring using tools like Ansible, Terraform, or Python scripts.

Security

  • Implement secure access controls, firewalls, and VPNs to protect GPU resources and user data.
  • Ensure compliance with security best practices for HPC environments.

Hybrid/Cloud Integration

  • Manage integrations between on-premise GPU clusters and cloud platforms (e.g., AWS, GCP, Azure).
  • Build and maintain hybrid HPC setups for seamless scalability.

Data Center Infrastructure

  • Work on power, cooling, and rack design for HPC setups, ensuring reliable and efficient operations.
  • Deploy and maintain systems in on-premise or hybrid cloud data center environments.



Required Qualifications

Technical Skills

  • Strong experience with Linux (CentOS, Ubuntu, RHEL) and system-level configuration.
  • Expertise in managing NVIDIA GPU ecosystems (CUDA, NVLink, NVIDIA drivers).
  • Familiarity with AMD ROCm, HIP, or OpenCL for AMD GPUs.
  • Knowledge of high-speed networking protocols (InfiniBand, RDMA, Ethernet).
  • Proficiency in scripting and automation (Python, Bash, Ansible, Terraform).
  • Experience with job orchestration tools like Kubernetes or Slurm.
  • Familiarity with containerization (Docker, NVIDIA Docker, Singularity).
  • Understanding of storage technologies, including NVMe and parallel file systems.

Soft Skills

  • Strong analytical and problem-solving skills.
  • Ability to work independently and as part of a remote team.
  • Excellent communication skills for cross-team collaboration.

Preferred Qualifications

  • Experience with hybrid cloud setups, including AWS Outposts, Azure Stack, or GCP Anthos.
  • Hands-on experience with hardware management tools like IPMI/BMC for remote server management.
  • Familiarity with emerging accelerators (e.g., SambaNova, Cerebras, Graphcore).


What We Offer

  • Competitive salary and benefits package.
  • Work with a talented and collaborative team of engineers.
  • Opportunities to work on cutting-edge GPU and HPC projects.
  • A flexible and dynamic startup environment where you can grow and innovate.
  • Opportunities for professional development and continuous learning.


  • India BitOoda Full time

    Job OverviewWe are seeking a highly skilled High-Performance Compute Engineer to join our team at BitOoda. As a key member of our infrastructure team, you will be responsible for designing, deploying, and maintaining high-performance compute clusters utilizing GPU-based technology.About the RoleThis is an exciting opportunity for a motivated engineer to work...


  • india BitOoda Full time

    Role Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...


  • india BitOoda Full time

    Role Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...


  • India BitOoda Full time

    Systems/Network Engineer Role Overview  Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise)Location: RemoteJob Type: Full-TimeAbout UsWe are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design, develop, and...


  • India BitOoda Full time

    About BitOoda">We are a pioneering force in high-performance computing (HPC) and AI, developing cutting-edge solutions powered by GPUs and specialized accelerators.">Job Summary">We are seeking an experienced GPU Optimization Specialist to join our team. As a Bare Metal Performance Engineer, you will design, develop, and optimize software running directly on...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise) Location:  Remote Job Type:  Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise) Location:  Remote Job Type:  Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...


  • India Vivekananda Institute of Professional Studies Full time

    About the RoleVivekananda Institute of Professional Studies is seeking a highly skilled and dedicated High-Performance Data Center Infrastructure Specialist to join our team.Key ResponsibilitiesDeploy, configure, and manage NVIDIA GPU infrastructure, including the latest models for AI/ML applications.Optimize and maintain data center hardware, including...


  • India Synopsys Inc Full time

    Job OverviewWe are seeking a highly skilled GPU Architect to lead the design and optimization of GPU-based high-performance computing architectures tailored for OPC (Optical Proximity Correction) software solutions.This role requires defining HPC solutions that effectively utilize current hardware while preparing us for cutting-edge GPU architectures on the...


  • India Self-employed Full time

    About the RoleWe are seeking a skilled HPC/Linux Systems Engineer to join our team at a global IT solutions provider. As a key member of our new HPC, AI & Quantum business unit, you will have the opportunity to work on exciting projects and collaborate with customers to understand their HPC system requirements and challenges.ResponsibilitiesImplement and...


  • India Vivekananda Institute of Professional Studies Full time

    About the RoleVivekananda Institute of Professional Studies seeks a highly skilled Data Center Engineer to join our team. As an AI Infrastructure Specialist, you will be responsible for managing, optimizing, and maintaining data center hardware and systems, with a focus on NVIDIA technologies.Key Responsibilities:NVIDIA Hardware & Software Management:...


  • India DC Tech Consulting Full time

    Job Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...


  • India SIRO Clinpharm Pvt. Ltd. Full time

    We are seeking an experienced High Performance Computing Systems Expert to join our team at SIRO Clinpharm Pvt. Ltd.About the RoleThis is a challenging opportunity for a highly skilled professional with expertise in designing, implementing, and managing high-performance computing (HPC) systems.The successful candidate will have extensive experience in...


  • India KLA Full time

    Job DescriptionWe are seeking a highly skilled Senior Technical Lead to join our team at KLA. In this role, you will be responsible for driving adherence to project timelines and ensuring program milestones are achieved on schedule.Key Responsibilities:Understand project specifications and performance requirements.Lead and expand a team of software...


  • india DC Tech Consulting Full time

    Job Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...


  • India ClearML Full time

    Information Technology Manager, AI Computing Company Description ClearML is a unified, open source platform for continuous AI/ML, trusted by forward-thinking Data Scientists, ML Engineers, DevOps, and decision makers at leading Fortune 500, enterprises, academia, and innovative start-ups worldwide. We enable customers to achieve the fastest time to...


  • India Mastercard Full time

    As a High-Performance Computing Engineer at Mastercard, you will be part of the global payments industry's most passionate and motivated team.We are looking for an experienced engineer to join our team in performance and availability engineering. If you have experience working with large, complex, scalable systems and a strong technical background including...


  • India TrueFan Full time

    Job OverviewSalary Range:$120,000 - $180,000 per year.We are a cutting-edge AI company focused on developing advanced lip-syncing technology using deep neural networks at TrueFan .Position DescriptionThe ideal candidate will play a crucial role in managing and scaling our machine learning models and infrastructure, enabling seamless deployment and automation...


  • India Sakar Robotics Full time

    Company Description Sakar Robotics is a dynamic and innovative company located in Pune, dedicated to revolutionizing the construction industry. Our mission is to provide cutting-edge solutions that transform construction activities and drive innovation in the field. We are seeking a skilled Senior Computer Vision Engineer with at least 3 years of...