
HPC System Engineer
2 weeks ago
We are seeking an experienced HPC (High-Performance Computing) System Engineer to design,
implement, and manage cutting-edge HPC infrastructure using Dell servers, AMD GPUs (MI210),
and Pure Storage systems. The ideal candidate will have expertise in Commvault backup
systems, Kubernetes container orchestration, and multitenancy configurations, ensuring
scalable, GPU-accelerated, and high-performance solutions tailored to enterprise and HPC
workloads.
Key Responsibilities:
Dell Servers:
Architect and deploy HPC systems using Dell PowerEdge servers, ensuring high availability
and optimized performance for compute-intensive applications.
Manage server hardware lifecycle, including deployment, upgrades, and diagnostics.
Configure HPC cluster nodes for seamless integration with Kubernetes and GPU workloads.
AMD GPUs (MI210):
Deploy and optimize AMD GPU-based servers to accelerate AI/ML, HPC, and data-intensive
applications.
Monitor GPU utilization, troubleshoot performance bottlenecks, and optimize workloads for
GPU acceleration.
Integrate GPUs into Kubernetes environments for containerized GPU-based applications.
Pure Storage:
Design and manage Pure Storage solutions, including FlashBlade, to support HPC and
data-intensive workloads.
Implement multitenancy configurations for isolated, secure, and efficient resource
utilization.
Monitor storage health and ensure performance optimization for high-speed data access.
Commvault Backup:
Architect and manage enterprise-wide Commvault backup solutions, ensuring data integrity
and readiness for disaster recovery.
Implement backup and retention policies for HPC environments, including containerized and
GPU-accelerated workloads.
Kubernetes Container Management:
Deploy and manage Kubernetes clusters for HPC applications, ensuring scalability and fault
tolerance.
Configure persistent storage for containerized workloads and integrate storage with GPUs for
high-performance data processing.
Monitor cluster performance and troubleshoot HPC-specific Kubernetes challenges.
System Optimization and Monitoring:
Implement advanced monitoring solutions for servers, GPUs, storage, and Kubernetes
clusters to ensure peak performance.
Develop and enforce policies for system security, resource allocation, and compliance with
industry standards.
Lead capacity planning and scaling initiatives for HPC infrastructure.
Team Leadership and Collaboration:
Mentor and guide junior engineers on HPC best practices, system design, and
troubleshooting techniques.
Collaborate with cross-functional teams, including data scientists and DevOps, to align
infrastructure capabilities with organizational goals.
Qualifications:
Technical Skills:
Extensive experience with Dell PowerEdge servers in HPC or enterprise environments.
Proven expertise in AMD GPUs (MI210), including their integration and optimization for AI/ML
and HPC workloads.
Advanced knowledge of Pure Storage systems, including multitenancy and high-
performance configurations.
Expertise in Commvault backup systems, including design, deployment, and disaster
recovery.
Strong proficiency in Kubernetes container orchestration, particularly for GPU-accelerated
applications.
Knowledge of high-performance interconnects (e.g., RDMA, InfiniBand) and networking for
HPC.
-
Hpc System Administrator
11 hours ago
Delhi, Delhi, India CosMic IT Full timeFull Time - Delhi NCR, Hyderabad, India - Posted 3 weeks ago - CosMicIT **CosMic IT** - Find Your Dream Job Here_ Greetings Everyone! We at #CosMicIT GmbH are urgently looking for an HPC SYSTEM Administrator Locations: Delhi NCR, Hyderabad, India Language: English Job Description: - Desired Experience Range 8 -12 Years Desired Competencies...
-
HPC System Administrator
3 weeks ago
Delhi Division, India NVISH SOLUTIONS PRIVATE LIMITED Full timeResponsibilities : - Administration of HPC and VDI clusters - User Account management for HPC onboarding and offboarding - Creation and Maintenance of AMI Images in AMI accounts- Install, configure, and maintain Linux operating systems on HPC clusters.- Support HPC necessary components and native services of the platform by coordinating with respective...
-
Lead sustenance engineer
2 days ago
Delhi, India DDN Full timeThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. Data Direct Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...
-
Lead sustenance engineer
20 hours ago
Delhi, India DDN Full timeThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. Data Direct Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...
-
Lead Sustenance Engineer
2 weeks ago
Delhi, India DDN Full timeThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...
-
Lead Sustenance Engineer
1 week ago
Delhi, India DDN Full timeThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...
-
Lead Sustenance Engineer
1 day ago
New Delhi, India DDN Full timeThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...
-
Principal Engineer
1 day ago
New Delhi, India AIRA Matrix Full timeJob Description: You will provide leadership in designing and implementing ground-breaking GPU computers that run demanding deep learning, high-performance computing, and computationally intensive workloads. We seek an expert to identify architectural changes and/or completely new approaches for accelerating our deep learning models. As an expert, you will...
-
Delhi, Delhi, India Submer Full time ₹ 15,00,000 - ₹ 25,00,000 per yearLocation: India #HybridStart: ASAPType of Role: Permanent / Full timeTravel Requirements: 20% (global engineering and customer engagements)About SubmerAt Submer, we are redefining how data centers are built, integrated, and operated, with sustainability, efficiency, and innovation at the core. Our technology is designed to reduce environmental impact while...
-
System Engineer
2 weeks ago
Delhi, India Netsmore Technologies Full timeSystems Engineer – Level 3 (Internal)Remote working 24X5 shiftMandatory skills: AWS cloud infrastructure + OKTA administrationFull time job with our clientOverview:The L3 Systems Engineer role is more engineering-focused than traditional system admin roles. It blends AWS design, implementation, SaaS administration, on-prem oversight, and technical...