Systems/Network Engineer – High-Performance Compute GPU Infrastructure
1 day ago
As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require on-site support for hardware installations or emergency maintenance.
Key Responsibilities:
System Optimization
- Configure and optimize bare-metal servers, including Linux OS, NVIDIA/AMD GPU drivers, and system libraries.- Fine-tune NUMA settings, CPU-GPU affinity, and storage I/O for peak performance.- Benchmark and tune HPC systems for specific workloads, ensuring sustained high performance.
GPU Cluster Management
- Deploy and manage GPU clusters using job orchestration tools like Kubernetes, Slurm, or similar platforms.- Monitor GPU utilization, thermals, and overall system health using tools like NVIDIA DCGM, ROCm, and Prometheus/Grafana.
Networking
- Design and maintain high-speed networking solutions (e.g., NVLink, InfiniBand, RDMA) for distributed GPU systems.- Optimize data transfer between nodes and reduce latency in cluster communication.
Storage Solutions
- Manage and configure storage solutions such as NVMe, SSD arrays, Ceph, or Lustre for high-throughput workloads.
Automation
- Automate system deployment, updates, and monitoring using tools like Ansible, Terraform, or Python scripts.
Security
- Implement secure access controls, firewalls, and VPNs to protect GPU resources and user data.- Ensure compliance with security best practices for HPC environments.
Hybrid/Cloud Integration
- Manage integrations between on-premise GPU clusters and cloud platforms (e.g., AWS, GCP, Azure).- Build and maintain hybrid HPC setups for seamless scalability.
Data Center Infrastructure
- Work on power, cooling, and rack design for HPC setups, ensuring reliable and efficient operations.- Deploy and maintain systems in on-premise or hybrid cloud data center environments.
Required Qualifications
Technical Skills
- Strong experience with Linux (CentOS, Ubuntu, RHEL) and system-level configuration.- Expertise in managing NVIDIA GPU ecosystems (CUDA, NVLink, NVIDIA drivers).- Familiarity with AMD ROCm, HIP, or OpenCL for AMD GPUs.- Knowledge of high-speed networking protocols (InfiniBand, RDMA, Ethernet).- Proficiency in scripting and automation (Python, Bash, Ansible, Terraform).- Experience with job orchestration tools like Kubernetes or Slurm.- Familiarity with containerization (Docker, NVIDIA Docker, Singularity).- Understanding of storage technologies, including NVMe and parallel file systems.
Soft Skills
- Strong analytical and problem-solving skills.- Ability to work independently and as part of a remote team.- Excellent communication skills for cross-team collaboration.
Preferred Qualifications
- Experience with hybrid cloud setups, including AWS Outposts, Azure Stack, or GCP Anthos.- Hands-on experience with hardware management tools like IPMI/BMC for remote server management.- Familiarity with emerging accelerators (e.g., SambaNova, Cerebras, Graphcore).
What We Offer
- Competitive salary and benefits package.- Work with a talented and collaborative team of engineers.- Opportunities to work on cutting-edge GPU and HPC projects.- A flexible and dynamic startup environment where you can grow and innovate.- Opportunities for professional development and continuous learning.
-
Delhi, India BitOoda Full timeRole OverviewAs a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining GPU-based compute infrastructure. You will work on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability. This role is primarily remote but may occasionally require...
-
High-Performance Compute Engineer
8 hours ago
Delhi, Delhi, India BitOoda Full timeJob Overview:As a Systems/Network Engineer, you will be responsible for architecting, deploying, and maintaining high-performance compute infrastructure leveraging NVIDIA GPUs. This role involves working on bare-metal systems, high-speed networks, and hybrid cloud integrations to ensure maximum performance, reliability, and scalability.Key...
-
High-Performance Data Center Specialist
1 day ago
Delhi, Delhi, India Vivekananda Institute of Professional Studies Full timeAbout the JobAt Vivekananda Institute of Professional Studies, we are seeking a highly skilled and dedicated Data Center Engineer (NVIDIA Specialist) to join our team. This role involves the management, optimization, and maintenance of data center hardware and systems, with a specific focus on NVIDIA technologies such as GPUs and AI/ML infrastructure.Key...
-
▷ 15h Left: GPU Optimization Engineer
13 hours ago
Delhi, India BitOoda Full timeJob Posting: GPU Optimization Engineer (Bare Metal Expertise)Location: RemoteJob Type: Full-TimeAbout UsWe are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design, develop, and...
-
Delhi, India BitOoda Full timeJob Posting: GPU Optimization Engineer (Bare Metal Expertise)Location:RemoteJob Type:Full-TimeAbout UsWe are an innovative company at the forefront of high-performance computing (HPC) and AI, building cutting-edge solutions powered by GPUs and specialized accelerators. We’re looking for a highly skilled GPU Optimization Engineer to design, develop, and...
-
Senior Systems Engineer
3 weeks ago
delhi, India DC Tech Consulting Full timeJob Profile: Senior Systems Engineer - Kubernetes & Linux PlatformSummary:An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...
-
Senior Systems Engineer
3 weeks ago
Delhi, India DC Tech Consulting Full timeJob Profile: Senior Systems Engineer - Kubernetes & Linux PlatformSummary:An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...
-
Senior Systems Engineer
3 weeks ago
Delhi, India DC Tech Consulting Full timeJob Profile: Senior Systems Engineer - Kubernetes & Linux PlatformSummary:An experienced Systems Engineer with over 10 years of specialized expertise in Linux platforms, Kubernetes cluster management, and advanced troubleshooting. Skilled in Kubernetes Day 2 operations, Linux networking, Linux storage, and Nvidia GPU configurations within Kubernetes...
-
High-Performance Infrastructure Engineer
8 hours ago
Delhi, Delhi, India LinkedIn Full timeAs a Cloud-Native Systems Developer at LinkedIn, you will play a crucial role in building the next-generation infrastructure platforms. With a focus on information retrieval (IR), you will be part of a high-performing team that develops distributed databases built using Rust to support multiple retrieval use cases.Key ResponsibilitiesDesign and build highly...
-
Delhi, India ClearML Full timeInformation Technology Manager, AI ComputingCompany DescriptionClearML is a unified, open source platform for continuous AI/ML, trusted by forward-thinking Data Scientists, ML Engineers, DevOps, and decision makers at leading Fortune 500, enterprises, academia, and innovative start-ups worldwide. We enable customers to achieve the fastest time to production,...
-
High-Performance Backend Systems Engineer
2 weeks ago
Delhi, Delhi, India Tykhe Inc Full timeJob Title: High-Performance Backend Systems EngineerAbout Us:Tykhe Inc is a cutting-edge company at the forefront of Generative Artificial Intelligence (GenAI). We're seeking an exceptional Product/Software Engineer-Backend to join our team in shaping the future of GenAI. This role offers exciting opportunities to work closely with cross-functional teams and...
-
Delhi, Delhi, India Mulya Technologies Full timeMulya Technologies Seeks Experienced ProfessionalWe are currently looking for a highly skilled Senior Microarchitecture Designer for High-Performance Systems to join our team at Mulya Technologies.About the RoleDesign and integrate high-performance System on Chip, architecting SoCs for power, performance, and area efficiency.Develop microarchitecture and...
-
High-Performance AI Developer
2 weeks ago
Delhi, Delhi, India AryaXAI Full timeAryaXAI is a pioneer in AI innovation, driving the development of explainable, safe, and aligned systems for mission-critical businesses.We are seeking a highly skilled High-Performance AI Developer to join our team and push the boundaries of high-performance AI computation. In this role, you will design, develop, and optimize GPU kernels that power...
-
Delhi, Delhi, India Mulya Technologies Full timeHigh-Performance SoC Design EngineerWe are seeking a highly skilled Senior ASIC Design Engineer to join our team at Mulya Technologies in Santa Clara, California.About the Role:We are looking for candidates with expertise in Arm IP background, specifically CHI, CMN, and Arm CPUs.The ideal candidate will have experience designing and integrating...
-
High-Performance Network Architect
1 week ago
Delhi, Delhi, India Gruve Full timeOverviewGruve is an innovative Software Services startup dedicated to empowering Enterprise Customers in managing their Data Life Cycle. We specialize in Cyber Security, Customer Experience, Infrastructure, and advanced technologies such as Machine Learning and Artificial Intelligence.Salary: $120,000 - $180,000 per annum (dependent on experience)About the...
-
Network and Infrastructure Engineer
14 hours ago
Delhi, India 2gethr Full timeAbout 2gethr : More than a co-working delight, 2gethr is the tale of creating a space for individuals & companies to chase their dreams & make them happen.2gethr has to offer a combination of three elements—home, work & leisure. What we wanted from our space was to stir emotions within our members & employees; to become an emblem of dream starter.Our...
-
Data center engineer
1 day ago
Delhi, India Vivekananda Institute Of Professional Studies Full timeAbout the JobTitle: Data Centre Engineer (NVIDIA Specialist)Reports to: Director GeneralLocation: VIPS Campus, DelhiApply by: 20th December, 2024About VIPS: Summary:We are seeking a highly skilled and dedicated Data Center Engineer (NVIDIA Specialist) to join our team. This role involves the management, optimization, and maintenance of data center...
-
Data Center Engineer
3 days ago
Delhi, India Vivekananda Institute of Professional Studies Full timeAbout the JobTitle: Data Centre Engineer (NVIDIA Specialist)Reports to: Director GeneralLocation: VIPS Campus, DelhiApply by: 20th December, 2024About VIPS: Summary:We are seeking a highly skilled and dedicated Data Center Engineer (NVIDIA Specialist) to join our team. This role involves the management, optimization, and maintenance of data center...
-
Greater Delhi Area, India ClearML Full timeInformation Technology Manager, AI Computing Company Description ClearML is a unified, open source platform for continuous AI/ML, trusted by forward-thinking Data Scientists, ML Engineers, DevOps, and decision makers at leading Fortune 500, enterprises, academia, and innovative start-ups worldwide. We enable customers to achieve the fastest time to...
-
Greater Delhi Area, India ClearML Full timeInformation Technology Manager, AI ComputingCompany DescriptionClearML is a unified, open source platform for continuous AI/ML, trusted by forward-thinking Data Scientists, ML Engineers, DevOps, and decision makers at leading Fortune 500, enterprises, academia, and innovative start-ups worldwide. We enable customers to achieve the fastest time to production,...