BitOoda | Systems/Network Engineer – High-Performance Compute GPU Infrastructure | india
1 day ago
Role Overview
As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require on-site support for hardware installations or emergency maintenance.
Key Responsibilities:
System Optimization
- Configure and optimize bare-metal servers, including Linux OS, NVIDIA/AMD GPU drivers, and system libraries.
- Fine-tune NUMA settings, CPU-GPU affinity, and storage I/O for peak performance.
- Benchmark and tune HPC systems for specific workloads, ensuring sustained high performance.
GPU Cluster Management
- Deploy and manage GPU clusters using job orchestration tools like Kubernetes, Slurm, or similar platforms.
- Monitor GPU utilization, thermals, and overall system health using tools like NVIDIA DCGM, ROCm, and Prometheus/Grafana.
Networking
- Design and maintain high-speed networking solutions (e.g., NVLink, InfiniBand, RDMA) for distributed GPU systems.
- Optimize data transfer between nodes and reduce latency in cluster communication.
Storage Solutions
- Manage and configure storage solutions such as NVMe, SSD arrays, Ceph, or Lustre for high-throughput workloads.
Automation
- Automate system deployment, updates, and monitoring using tools like Ansible, Terraform, or Python scripts.
Security
- Implement secure access controls, firewalls, and VPNs to protect GPU resources and user data.
- Ensure compliance with security best practices for HPC environments.
Hybrid/Cloud Integration
- Manage integrations between on-premise GPU clusters and cloud platforms (e.g., AWS, GCP, Azure).
- Build and maintain hybrid HPC setups for seamless scalability.
Data Center Infrastructure
- Work on power, cooling, and rack design for HPC setups, ensuring reliable and efficient operations.
- Deploy and maintain systems in on-premise or hybrid cloud data center environments.
Required Qualifications
Technical Skills
- Strong experience with Linux (CentOS, Ubuntu, RHEL) and system-level configuration.
- Expertise in managing NVIDIA GPU ecosystems (CUDA, NVLink, NVIDIA drivers).
- Familiarity with AMD ROCm, HIP, or OpenCL for AMD GPUs.
- Knowledge of high-speed networking protocols (InfiniBand, RDMA, Ethernet).
- Proficiency in scripting and automation (Python, Bash, Ansible, Terraform).
- Experience with job orchestration tools like Kubernetes or Slurm.
- Familiarity with containerization (Docker, NVIDIA Docker, Singularity).
- Understanding of storage technologies, including NVMe and parallel file systems.
Soft Skills
- Strong analytical and problem-solving skills.
- Ability to work independently and as part of a remote team.
- Excellent communication skills for cross-team collaboration.
Preferred Qualifications
- Experience with hybrid cloud setups, including AWS Outposts, Azure Stack, or GCP Anthos.
- Hands-on experience with hardware management tools like IPMI/BMC for remote server management.
- Familiarity with emerging accelerators (e.g., SambaNova, Cerebras, Graphcore).
What We Offer
- Competitive salary and benefits package.
- Work with a talented and collaborative team of engineers.
- Opportunities to work on cutting-edge GPU and HPC projects.
- A flexible and dynamic startup environment where you can grow and innovate.
- Opportunities for professional development and continuous learning.
-
High-Performance Compute Engineer
1 day ago
India BitOoda Full timeJob OverviewWe are seeking a highly skilled High-Performance Compute Engineer to join our team at BitOoda. As a key member of our infrastructure team, you will be responsible for designing, deploying, and maintaining high-performance compute clusters utilizing GPU-based technology.About the RoleThis is an exciting opportunity for a motivated engineer to work...
-
india BitOoda Full timeRole Overview As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...
-
india BitOoda Full timeRole OverviewAs a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...
-
india BitOoda Full timeJob Posting: GPU Optimization Engineer (Bare Metal Expertise) Location: Remote Job Type: Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...
-
india BitOoda Full timeJob Posting: GPU Optimization Engineer (Bare Metal Expertise) Location: Remote Job Type: Full-Time About Us We are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design,...
-
BitOoda | GPU Optimization Engineer
1 day ago
india BitOoda Full timeJob Posting: GPU Optimization Engineer (Bare Metal Expertise)Location: RemoteJob Type: Full-TimeAbout UsWe are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design, develop, and...
-
GPU Optimization Specialist
1 day ago
India BitOoda Full timeAbout BitOoda">We are a pioneering force in high-performance computing (HPC) and AI, developing cutting-edge solutions powered by GPUs and specialized accelerators.">Job Summary">We are seeking an experienced GPU Optimization Specialist to join our team. As a Bare Metal Performance Engineer, you will design, develop, and optimize software running directly on...
-
High-Performance Computing Expert
3 days ago
India Synopsys Inc Full timeJob OverviewWe are seeking a highly skilled GPU Architect to lead the design and optimization of GPU-based high-performance computing architectures tailored for OPC (Optical Proximity Correction) software solutions.This role requires defining HPC solutions that effectively utilize current hardware while preparing us for cutting-edge GPU architectures on the...
-
India Vivekananda Institute of Professional Studies Full timeAbout the RoleVivekananda Institute of Professional Studies is seeking a highly skilled and dedicated High-Performance Data Center Infrastructure Specialist to join our team.Key ResponsibilitiesDeploy, configure, and manage NVIDIA GPU infrastructure, including the latest models for AI/ML applications.Optimize and maintain data center hardware, including...
-
India Self-employed Full timeAbout the RoleWe are seeking a skilled HPC/Linux Systems Engineer to join our team at a global IT solutions provider. As a key member of our new HPC, AI & Quantum business unit, you will have the opportunity to work on exciting projects and collaborate with customers to understand their HPC system requirements and challenges.ResponsibilitiesImplement and...
-
DC Tech Consulting | Senior Systems Engineer
3 weeks ago
india DC Tech Consulting Full timeJob Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...
-
Data Center Engineer
3 days ago
India Vivekananda Institute of Professional Studies Full timeAbout the RoleVivekananda Institute of Professional Studies seeks a highly skilled Data Center Engineer to join our team. As an AI Infrastructure Specialist, you will be responsible for managing, optimizing, and maintaining data center hardware and systems, with a focus on NVIDIA technologies.Key Responsibilities:NVIDIA Hardware & Software Management:...
-
Senior Systems Engineer
3 weeks ago
India DC Tech Consulting Full timeJob Profile: Senior Systems Engineer - Kubernetes & Linux Platform Summary: An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...
-
india Ubique Systems Full timeResponsible for managing capacity across public and private cloud resource pools, including automating scale-down/-up of environments. Improve cloud product reliability, availability, maintainability, and cost/benefit—including developing fault-tolerant tools to ensure the general robustness of the cloud infrastructure. Design and implement CI/CD pipeline...
-
india Ubique Systems Full timeResponsible for managing capacity across public and private cloud resource pools, including automating scale-down/-up of environments. Improve cloud product reliability, availability, maintainability, and cost/benefit—including developing fault-tolerant tools to ensure the general robustness of the cloud infrastructure. Design and implement CI/CD pipeline...
-
Network Infrastructure Specialist
3 days ago
India Vervent Full timeJob DescriptionThis role plays a crucial part in ensuring the reliability and performance of our global network infrastructure.We are seeking an experienced IT Systems Engineer to join our team. As a key member of our technical staff, you will design, implement, and maintain our networking systems, ensuring high availability and efficient traffic...
-
Sky Systems, Inc.
3 weeks ago
india Sky Systems, Inc. (SkySys) Full timeRole: Cloud Operations DevOps Engineer Position Type: Full-Time Contract (40hrs/week) Contract Duration: 6+ Months (Possibility of Contract – to – Hire) Work Hours: India Standard Time (IST) Work Schedule: 8 hours/day (Mon-Fri) Location: Pune, India – Hybrid (Onsite 3 days a week) As Cloud Operations DevOps Engineer, you'll apply specialized...
-
Soffit Infrastructure Services
4 weeks ago
india Soffit Infrastructure Services (P) Ltd Full timeJob Summary:We are seeking a proactive and detail-oriented Network Engineer (L1) to join our IT infrastructure team. The ideal candidate will play a crucial role in monitoring, troubleshooting, and maintaining the organization’s network infrastructure. This position focuses on ensuring seamless connectivity, with an emphasis on Switching and Routing...
-
Soffit Infrastructure Services
4 weeks ago
india Soffit Infrastructure Services (P) Ltd Full timeJob Summary: We are seeking a proactive and detail-oriented Network Engineer (L1) to join our IT infrastructure team. The ideal candidate will play a crucial role in monitoring, troubleshooting, and maintaining the organization’s network infrastructure. This position focuses on ensuring seamless connectivity, with an emphasis on Switching and Routing...
-
India ClearML Full timeInformation Technology Manager, AI Computing Company Description ClearML is a unified, open source platform for continuous AI/ML, trusted by forward-thinking Data Scientists, ML Engineers, DevOps, and decision makers at leading Fortune 500, enterprises, academia, and innovative start-ups worldwide. We enable customers to achieve the fastest time to...