BitOoda | Systems/Network Engineer – High-Performance Compute GPU Infrastructure | india

2 days ago


india BitOoda Full time

Role Overview

As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require on-site support for hardware installations or emergency maintenance.


Key Responsibilities:

System Optimization

  • Configure and optimize bare-metal servers, including Linux OS, NVIDIA/AMD GPU drivers, and system libraries.
  • Fine-tune NUMA settings, CPU-GPU affinity, and storage I/O for peak performance.
  • Benchmark and tune HPC systems for specific workloads, ensuring sustained high performance.

GPU Cluster Management

  • Deploy and manage GPU clusters using job orchestration tools like Kubernetes, Slurm, or similar platforms.
  • Monitor GPU utilization, thermals, and overall system health using tools like NVIDIA DCGM, ROCm, and Prometheus/Grafana.

Networking

  • Design and maintain high-speed networking solutions (e.g., NVLink, InfiniBand, RDMA) for distributed GPU systems.
  • Optimize data transfer between nodes and reduce latency in cluster communication.

Storage Solutions

  • Manage and configure storage solutions such as NVMe, SSD arrays, Ceph, or Lustre for high-throughput workloads.

Automation

  • Automate system deployment, updates, and monitoring using tools like Ansible, Terraform, or Python scripts.

Security

  • Implement secure access controls, firewalls, and VPNs to protect GPU resources and user data.
  • Ensure compliance with security best practices for HPC environments.

Hybrid/Cloud Integration

  • Manage integrations between on-premise GPU clusters and cloud platforms (e.g., AWS, GCP, Azure).
  • Build and maintain hybrid HPC setups for seamless scalability.

Data Center Infrastructure

  • Work on power, cooling, and rack design for HPC setups, ensuring reliable and efficient operations.
  • Deploy and maintain systems in on-premise or hybrid cloud data center environments.



Required Qualifications

Technical Skills

  • Strong experience with Linux (CentOS, Ubuntu, RHEL) and system-level configuration.
  • Expertise in managing NVIDIA GPU ecosystems (CUDA, NVLink, NVIDIA drivers).
  • Familiarity with AMD ROCm, HIP, or OpenCL for AMD GPUs.
  • Knowledge of high-speed networking protocols (InfiniBand, RDMA, Ethernet).
  • Proficiency in scripting and automation (Python, Bash, Ansible, Terraform).
  • Experience with job orchestration tools like Kubernetes or Slurm.
  • Familiarity with containerization (Docker, NVIDIA Docker, Singularity).
  • Understanding of storage technologies, including NVMe and parallel file systems.

Soft Skills

  • Strong analytical and problem-solving skills.
  • Ability to work independently and as part of a remote team.
  • Excellent communication skills for cross-team collaboration.

Preferred Qualifications

  • Experience with hybrid cloud setups, including AWS Outposts, Azure Stack, or GCP Anthos.
  • Hands-on experience with hardware management tools like IPMI/BMC for remote server management.
  • Familiarity with emerging accelerators (e.g., SambaNova, Cerebras, Graphcore).


What We Offer

  • Competitive salary and benefits package.
  • Work with a talented and collaborative team of engineers.
  • Opportunities to work on cutting-edge GPU and HPC projects.
  • A flexible and dynamic startup environment where you can grow and innovate.
  • Opportunities for professional development and continuous learning.


  • India BitOoda Full time

    Job OverviewWe are seeking a highly skilled High-Performance Compute Engineer to join our team at BitOoda. As a key member of our infrastructure team, you will be responsible for designing, deploying, and maintaining high-performance compute clusters utilizing GPU-based technology.About the RoleThis is an exciting opportunity for a motivated engineer to work...


  • india BitOoda Full time

    Role Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...


  • india BitOoda Full time

    Role OverviewAs a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...


  • India BitOoda Full time

    Systems/Network Engineer Role Overview  Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise) Location:  Remote Job Type:  Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise) Location:  Remote Job Type:  Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...


  • india BitOoda Full time

    Job Posting: GPU Optimization Engineer (Bare Metal Expertise)Location: RemoteJob Type: Full-TimeAbout UsWe are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design, develop, and...


  • India BitOoda Full time

    About BitOoda">We are a pioneering force in high-performance computing (HPC) and AI, developing cutting-edge solutions powered by GPUs and specialized accelerators.">Job Summary">We are seeking an experienced GPU Optimization Specialist to join our team. As a Bare Metal Performance Engineer, you will design, develop, and optimize software running directly on...


  • India Synopsys Inc Full time

    Job OverviewWe are seeking a highly skilled GPU Architect to lead the design and optimization of GPU-based high-performance computing architectures tailored for OPC (Optical Proximity Correction) software solutions.This role requires defining HPC solutions that effectively utilize current hardware while preparing us for cutting-edge GPU architectures on the...


  • India Vivekananda Institute of Professional Studies Full time

    About the RoleVivekananda Institute of Professional Studies is seeking a highly skilled and dedicated High-Performance Data Center Infrastructure Specialist to join our team.Key ResponsibilitiesDeploy, configure, and manage NVIDIA GPU infrastructure, including the latest models for AI/ML applications.Optimize and maintain data center hardware, including...


  • India Self-employed Full time

    About the RoleWe are seeking a skilled HPC/Linux Systems Engineer to join our team at a global IT solutions provider. As a key member of our new HPC, AI & Quantum business unit, you will have the opportunity to work on exciting projects and collaborate with customers to understand their HPC system requirements and challenges.ResponsibilitiesImplement and...


  • india DC Tech Consulting Full time

    Job Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...


  • India Vivekananda Institute of Professional Studies Full time

    About the RoleVivekananda Institute of Professional Studies seeks a highly skilled Data Center Engineer to join our team. As an AI Infrastructure Specialist, you will be responsible for managing, optimizing, and maintaining data center hardware and systems, with a focus on NVIDIA technologies.Key Responsibilities:NVIDIA Hardware & Software Management:...


  • India DC Tech Consulting Full time

    Job Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...


  • India SIRO Clinpharm Pvt. Ltd. Full time

    We are seeking an experienced High Performance Computing Systems Expert to join our team at SIRO Clinpharm Pvt. Ltd.About the RoleThis is a challenging opportunity for a highly skilled professional with expertise in designing, implementing, and managing high-performance computing (HPC) systems.The successful candidate will have extensive experience in...


  • India KLA Full time

    Job DescriptionWe are seeking a highly skilled Senior Technical Lead to join our team at KLA. In this role, you will be responsible for driving adherence to project timelines and ensuring program milestones are achieved on schedule.Key Responsibilities:Understand project specifications and performance requirements.Lead and expand a team of software...


  • india Ubique Systems Full time

    Responsible for managing capacity across public and private cloud resource pools, including automating scale-down/-up of environments. Improve cloud product reliability, availability, maintainability, and cost/benefit—including developing fault-tolerant tools to ensure the general robustness of the cloud infrastructure. Design and implement CI/CD pipeline...


  • india Ubique Systems Full time

    Responsible for managing capacity across public and private cloud resource pools, including automating scale-down/-up of environments. Improve cloud product reliability, availability, maintainability, and cost/benefit—including developing fault-tolerant tools to ensure the general robustness of the cloud infrastructure. Design and implement CI/CD pipeline...


  • India Vervent Full time

    Job DescriptionThis role plays a crucial part in ensuring the reliability and performance of our global network infrastructure.We are seeking an experienced IT Systems Engineer to join our team. As a key member of our technical staff, you will design, implement, and maintain our networking systems, ensuring high availability and efficient traffic...

  • Sky Systems, Inc.

    3 weeks ago


    india Sky Systems, Inc. (SkySys) Full time

    Role: Cloud Operations DevOps Engineer Position Type: Full-Time Contract (40hrs/week) Contract Duration: 6+ Months (Possibility of Contract – to – Hire) Work Hours: India Standard Time (IST) Work Schedule: 8 hours/day (Mon-Fri) Location: Pune, India – Hybrid (Onsite 3 days a week) As Cloud Operations DevOps Engineer, you'll apply specialized...