BitOoda | Systems/Network Engineer – High-Performance Compute GPU Infrastructure | india

1 day ago


india BitOoda Full time

Role Overview

As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require on-site support for hardware installations or emergency maintenance.


Key Responsibilities:

System Optimization

  • Configure and optimize bare-metal servers, including Linux OS, NVIDIA/AMD GPU drivers, and system libraries.
  • Fine-tune NUMA settings, CPU-GPU affinity, and storage I/O for peak performance.
  • Benchmark and tune HPC systems for specific workloads, ensuring sustained high performance.

GPU Cluster Management

  • Deploy and manage GPU clusters using job orchestration tools like Kubernetes, Slurm, or similar platforms.
  • Monitor GPU utilization, thermals, and overall system health using tools like NVIDIA DCGM, ROCm, and Prometheus/Grafana.

Networking

  • Design and maintain high-speed networking solutions (e.g., NVLink, InfiniBand, RDMA) for distributed GPU systems.
  • Optimize data transfer between nodes and reduce latency in cluster communication.

Storage Solutions

  • Manage and configure storage solutions such as NVMe, SSD arrays, Ceph, or Lustre for high-throughput workloads.

Automation

  • Automate system deployment, updates, and monitoring using tools like Ansible, Terraform, or Python scripts.

Security

  • Implement secure access controls, firewalls, and VPNs to protect GPU resources and user data.
  • Ensure compliance with security best practices for HPC environments.

Hybrid/Cloud Integration

  • Manage integrations between on-premise GPU clusters and cloud platforms (e.g., AWS, GCP, Azure).
  • Build and maintain hybrid HPC setups for seamless scalability.

Data Center Infrastructure

  • Work on power, cooling, and rack design for HPC setups, ensuring reliable and efficient operations.
  • Deploy and maintain systems in on-premise or hybrid cloud data center environments.



Required Qualifications

Technical Skills

  • Strong experience with Linux (CentOS, Ubuntu, RHEL) and system-level configuration.
  • Expertise in managing NVIDIA GPU ecosystems (CUDA, NVLink, NVIDIA drivers).
  • Familiarity with AMD ROCm, HIP, or OpenCL for AMD GPUs.
  • Knowledge of high-speed networking protocols (InfiniBand, RDMA, Ethernet).
  • Proficiency in scripting and automation (Python, Bash, Ansible, Terraform).
  • Experience with job orchestration tools like Kubernetes or Slurm.
  • Familiarity with containerization (Docker, NVIDIA Docker, Singularity).
  • Understanding of storage technologies, including NVMe and parallel file systems.

Soft Skills

  • Strong analytical and problem-solving skills.
  • Ability to work independently and as part of a remote team.
  • Excellent communication skills for cross-team collaboration.

Preferred Qualifications

  • Experience with hybrid cloud setups, including AWS Outposts, Azure Stack, or GCP Anthos.
  • Hands-on experience with hardware management tools like IPMI/BMC for remote server management.
  • Familiarity with emerging accelerators (e.g., SambaNova, Cerebras, Graphcore).


What We Offer

  • Competitive salary and benefits package.
  • Work with a talented and collaborative team of engineers.
  • Opportunities to work on cutting-edge GPU and HPC projects.
  • A flexible and dynamic startup environment where you can grow and innovate.
  • Opportunities for professional development and continuous learning.


  • India BitOoda Full time

    Job OverviewWe are seeking a highly skilled High-Performance Compute Engineer to join our team at BitOoda. As a key member of our infrastructure team, you will be responsible for designing, deploying, and maintaining high-performance compute clusters utilizing GPU-based technology.About the RoleThis is an exciting opportunity for a motivated engineer to work...


  • india BitOoda Full time

    Role Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...


  • india BitOoda Full time

    Role OverviewAs a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise) Location:  Remote Job Type:  Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise) Location:  Remote Job Type:  Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise)Location: RemoteJob Type: Full-TimeAbout UsWe are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design, develop, and...


  • India BitOoda Full time

    About BitOoda">We are a pioneering force in high-performance computing (HPC) and AI, developing cutting-edge solutions powered by GPUs and specialized accelerators.">Job Summary">We are seeking an experienced GPU Optimization Specialist to join our team. As a Bare Metal Performance Engineer, you will design, develop, and optimize software running directly on...


  • India Synopsys Inc Full time

    Job OverviewWe are seeking a highly skilled GPU Architect to lead the design and optimization of GPU-based high-performance computing architectures tailored for OPC (Optical Proximity Correction) software solutions.This role requires defining HPC solutions that effectively utilize current hardware while preparing us for cutting-edge GPU architectures on the...


  • India Vivekananda Institute of Professional Studies Full time

    About the RoleVivekananda Institute of Professional Studies is seeking a highly skilled and dedicated High-Performance Data Center Infrastructure Specialist to join our team.Key ResponsibilitiesDeploy, configure, and manage NVIDIA GPU infrastructure, including the latest models for AI/ML applications.Optimize and maintain data center hardware, including...


  • India Self-employed Full time

    About the RoleWe are seeking a skilled HPC/Linux Systems Engineer to join our team at a global IT solutions provider. As a key member of our new HPC, AI & Quantum business unit, you will have the opportunity to work on exciting projects and collaborate with customers to understand their HPC system requirements and challenges.ResponsibilitiesImplement and...


  • india DC Tech Consulting Full time

    Job Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...


  • India Vivekananda Institute of Professional Studies Full time

    About the RoleVivekananda Institute of Professional Studies seeks a highly skilled Data Center Engineer to join our team. As an AI Infrastructure Specialist, you will be responsible for managing, optimizing, and maintaining data center hardware and systems, with a focus on NVIDIA technologies.Key Responsibilities:NVIDIA Hardware & Software Management:...


  • India DC Tech Consulting Full time

    Job Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...


  • india Ubique Systems Full time

    Responsible for managing capacity across public and private cloud resource pools, including automating scale-down/-up of environments. Improve cloud product reliability, availability, maintainability, and cost/benefit—including developing fault-tolerant tools to ensure the general robustness of the cloud infrastructure. Design and implement CI/CD pipeline...


  • india Ubique Systems Full time

    Responsible for managing capacity across public and private cloud resource pools, including automating scale-down/-up of environments. Improve cloud product reliability, availability, maintainability, and cost/benefit—including developing fault-tolerant tools to ensure the general robustness of the cloud infrastructure. Design and implement CI/CD pipeline...


  • India Vervent Full time

    Job DescriptionThis role plays a crucial part in ensuring the reliability and performance of our global network infrastructure.We are seeking an experienced IT Systems Engineer to join our team. As a key member of our technical staff, you will design, implement, and maintain our networking systems, ensuring high availability and efficient traffic...

  • Sky Systems, Inc.

    3 weeks ago


    india Sky Systems, Inc. (SkySys) Full time

    Role: Cloud Operations DevOps Engineer Position Type: Full-Time Contract (40hrs/week) Contract Duration: 6+ Months (Possibility of Contract – to – Hire) Work Hours: India Standard Time (IST) Work Schedule: 8 hours/day (Mon-Fri) Location: Pune, India – Hybrid (Onsite 3 days a week) As Cloud Operations DevOps Engineer, you'll apply specialized...


  • india Soffit Infrastructure Services (P) Ltd Full time

    Job Summary:We are seeking a proactive and detail-oriented Network Engineer (L1) to join our IT infrastructure team. The ideal candidate will play a crucial role in monitoring, troubleshooting, and maintaining the organization’s network infrastructure. This position focuses on ensuring seamless connectivity, with an emphasis on Switching and Routing...


  • india Soffit Infrastructure Services (P) Ltd Full time

    Job Summary: We are seeking a proactive and detail-oriented Network Engineer (L1) to join our IT infrastructure team. The ideal candidate will play a crucial role in monitoring, troubleshooting, and maintaining the organization’s network infrastructure. This position focuses on ensuring seamless connectivity, with an emphasis on Switching and Routing...


  • India ClearML Full time

    Information Technology Manager, AI Computing Company Description ClearML is a unified, open source platform for continuous AI/ML, trusted by forward-thinking Data Scientists, ML Engineers, DevOps, and decision makers at leading Fortune 500, enterprises, academia, and innovative start-ups worldwide. We enable customers to achieve the fastest time to...