BitOoda | Systems/Network Engineer – High-Performance Compute GPU Infrastructure

2 days ago


hyderabad, India BitOoda Full time

Systems/Network Engineer Role Overview




Overview

As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require on-site support for hardware installations or emergency maintenance.

Key Responsibilities

System Optimization

  • Configure and optimize bare-metal servers, including Linux OS, NVIDIA/AMD GPU drivers, and system libraries.
  • Fine-tune NUMA settings, CPU-GPU affinity, and storage I/O for peak performance.
  • Benchmark and tune HPC systems for specific workloads, ensuring sustained high performance.

GPU Cluster Management

  • Deploy and manage GPU clusters using job orchestration tools like Kubernetes, Slurm, or similar platforms.
  • Monitor GPU utilization, thermals, and overall system health using tools like NVIDIA DCGM, ROCm, and Prometheus/Grafana.

Networking

  • Design and maintain high-speed networking solutions (e.g., NVLink, InfiniBand, RDMA) for distributed GPU systems.
  • Optimize data transfer between nodes and reduce latency in cluster communication.

Storage Solutions

  • Manage and configure storage solutions such as NVMe, SSD arrays, Ceph, or Lustre for high-throughput workloads.

Automation

  • Automate system deployment, updates, and monitoring using tools like Ansible, Terraform, or Python scripts.

Security

  • Implement secure access controls, firewalls, and VPNs to protect GPU resources and user data.
  • Ensure compliance with security best practices for HPC environments.

Hybrid/Cloud Integration

  • Manage integrations between on-premise GPU clusters and cloud platforms (e.g., AWS, GCP, Azure).
  • Build and maintain hybrid HPC setups for seamless scalability.

Data Center Infrastructure

  • Work on power, cooling, and rack design for HPC setups, ensuring reliable and efficient operations.
  • Deploy and maintain systems in on-premise or hybrid cloud data center environments.

Required Qualifications

Technical Skills

  • Strong experience with Linux (CentOS, Ubuntu, RHEL) and system-level configuration.
  • Expertise in managing NVIDIA GPU ecosystems (CUDA, NVLink, NVIDIA drivers).
  • Familiarity with AMD ROCm, HIP, or OpenCL for AMD GPUs.
  • Knowledge of high-speed networking protocols (InfiniBand, RDMA, Ethernet).
  • Proficiency in scripting and automation (Python, Bash, Ansible, Terraform).
  • Experience with job orchestration tools like Kubernetes or Slurm.
  • Familiarity with containerization (Docker, NVIDIA Docker, Singularity).
  • Understanding of storage technologies, including NVMe and parallel file systems.

Soft Skills

  • Strong analytical and problem-solving skills.
  • Ability to work independently and as part of a remote team.
  • Excellent communication skills for cross-team collaboration.

Preferred Qualifications

  • Experience with hybrid cloud setups, including AWS Outposts, Azure Stack, or GCP Anthos.
  • Hands-on experience with hardware management tools like IPMI/BMC for remote server management.
  • Familiarity with emerging accelerators (e.g., SambaNova, Cerebras, Graphcore).

What We Offer

  • Competitive salary and benefits package.
  • Work with a talented and collaborative team of engineers.
  • Opportunities to work on cutting-edge GPU and HPC projects.
  • A flexible and dynamic startup environment where you can grow and innovate.
  • Opportunities for professional development and continuous learning.


  • Hyderabad, India BitOoda Full time

    Systems/Network Engineer Role OverviewOverviewAs a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily...


  • hyderabad, India BitOoda Full time

    Systems/Network Engineer Role Overview  Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily...


  • Hyderabad, Telangana, India BitOoda Full time

    About the RoleWe are seeking a highly skilled GPU Infrastructure Architect to join our team at BitOoda. As a key member of our engineering team, you will be responsible for designing, deploying, and maintaining high-performance GPU-based compute infrastructure.Key ResponsibilitiesConfigure and optimize bare-metal servers, including Linux OS, NVIDIA/AMD GPU...


  • Hyderabad, India BitOoda Full time

    Systems/Network Engineer Role Overview  Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is...


  • Hyderabad, India BitOoda Full time

    Systems/Network Engineer Role OverviewOverviewAs a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily...


  • Hyderabad, India BitOoda Full time

    Systems/Network Engineer Role OverviewOverviewAs a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily...


  • Hyderabad, Telangana, India NVIDIA Full time

    NVIDIA has revolutionized the tech industry over two decades, with its invention of the GPU in 1999 sparking a new era in computer graphics and parallel computing. Building on this innovation, our deep learning technology ignited modern AI, transforming the way we approach complex problems.We are seeking highly motivated engineers to join our HW architecture...

  • GPU Software Engineer

    1 month ago


    Hyderabad, Telangana, India Intel Full time

    Job DescriptionWe are seeking a talented GPU Compiler Engineer to join our team at Intel.As a key member of our team, you will be responsible for designing and developing state-of-the-art compiler translations and optimizations targeting GPU systems.You will investigate and prototype new optimizations to improve performance and efficiency for GPU systems,...


  • Hyderabad, Telangana, India Oracle Full time

    We are seeking a highly skilled Network Architect and Distributed Systems Engineer to join our team at Oracle.About the Role:This is an exciting opportunity for a talented engineer to design, develop, and operate the network stack required to run distributed AI workloads across a cluster spanning thousands of GPUs.Key Responsibilities:Designing and...


  • Hyderabad, India Intel Full time

    Job Description Are you interested in computer graphics and the opportunity to work with the Linux software engineering team on Intel's leading-edge Graphics/Compute products? Come join us. WHO WE ARE : The GPU and System Software Engineering organization is responsible for developing Linux drivers and technology for Intel's Graphics/Compute...


  • Hyderabad, Telangana, India 8bit Full time

    We are 8bit.ai, a pioneering new initiative from CtrlS and Cloud4C group. Our mission is to develop a high-performance multi-technology, vendor-independent, and xPU-based Accelerated Cloud Computing platform. This platform will enable us to launch a global accelerated cloud solution. In addition to this, we will focus on broader Artificial General...

  • GPU Compiler Engineer

    6 months ago


    Hyderabad, India Intel Full time

    Job Description Do Something Wonderful. Intel put the Silicon in Silicon Valley. No one else is this obsessed with engineering a brighter future. Every day, we create world changing technology that enriches the lives of every person on earth. So, if you have a big idea, let's do something wonderful together. Join us, because at Intel, we are building a...


  • Hyderabad, Telangana, India Splunk Inc Full time

    Splunk Inc is a company that strives to make machine data accessible, usable, and valuable to everyone. Our innovative vision drives us to deliver the best experience for our customers.We are looking for a highly skilled Senior Performance Engineer in Test who can join our team and contribute to our success. The ideal candidate will have a strong background...


  • Hyderabad, Telangana, India 8bit Full time

    8bit.ai: Unlocking High Performance Computing PotentialWe are 8bit.ai, a pioneering initiative from CtrlS and Cloud4C group, dedicated to developing cutting-edge Accelerated Cloud Computing platforms. Our focus is on creating high-performance, vendor-independent, and xPU-based solutions that cater to the computational needs of our clients.We seek an...


  • Hyderabad, Telangana, India Oracle Full time

    This team will be responsible for provisioning, securing, scaling & operating the network stack required to run distributed AI workloads across a cluster spanning thousands of GPUs. Our customers want auto-remediation of incidents, touchless upgrade across 1000s of network devices and adding network capacity seamlessly.As a member of the software engineering...

  • GPU Engineer

    5 days ago


    Hyderabad, Telangana, India BITSILICA Full time

    About the RoleWe are seeking a skilled GPU Engineer to join our team at BITSILICA. This is an exciting opportunity to work on cutting-edge GPU hardware, software, and device drivers.Job DescriptionThis position involves developing and optimizing embedded kernel software for 3D graphics on mobile devices. You will design and develop kernel mode drivers on...


  • Hyderabad, Telangana, India Intel Full time

    About the JobWe are seeking a highly skilled Graphics Software Engineer to join our team at Intel.Description:The selected candidate will be responsible for developing and validating software that enables Intel GPUs, spanning the entire stack from firmware and device drivers through APIs and the application layer.This role requires strong technical skills,...


  • Hyderabad, Telangana, India Mulya Technologies Full time

    About the RoleWe are seeking a highly skilled SoC Architect and IP Design Engineer to join our team at Mulya Technologies.Job DescriptionThe successful candidate will be responsible for designing and integrating high-performance System on Chip (SoC) solutions, with a focus on power, performance, and area efficiency. Key responsibilities include:Architectural...


  • Hyderabad, India Microsoft Full time

    Overview Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing,...


  • Hyderabad, Telangana, India ADCI HYD 13 SEZ - H84 Full time

    About the RoleWe are seeking an experienced Tririga Infrastructure Engineer to join our team at ADCI HYD 13 SEZ - H84. This is a fantastic opportunity to work on building high-performance, globally scalable financial systems that support our current and future growth.Responsibilities:Design, implement, maintain, and optimize the infrastructure supporting...