Senior Hpc Engineer
1 week ago
Job Title: Senior Engineer-HPC Department: Production & Support Location: Faridabad Position Summary: Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing, implementing, and optimizing HPC infrastructure, including compute, storage, and high-speed networking, to deliver maximum performance for demanding workloads. Key Responsibilities: HPC Cluster Management & Optimization - Design, implement, and maintain HPC environments, including compute, storage, and network components. - Configure and optimize Slurm, PBS Pro, or other workload managers/schedulers for efficient job scheduling and resource allocation. - Implement performance tuning for CPU, GPU, memory, I/O, and network subsystems to meet workload demands. - Manage HPC filesystem solutions such as Lustre, BeeGFS, or GPFS/Spectrum Scale. Linux Administration - Administer enterprise-grade Linux distributions (RHEL, CentOS, Rocky, Ubuntu) in large-scale compute environments. - Manage kernel upgrades, patching, and security hardening. - Troubleshoot kernel-level and system-level issues for performance and stability. Automation & Configuration Management - Develop and maintain Ansible playbooks/roles for automated provisioning, configuration, and patching of HPC systems. - Integrate Ansible with CI/CD pipelines for infrastructure as code (IaC) practices. - Automate cluster deployment and environment consistency across hundreds of nodes. Monitoring, Troubleshooting & Support - Implement and maintain monitoring tools (e.G., Grafana, Prometheus, Nagios, Ganglia). - Troubleshoot complex HPC workloads, MPI communication issues, and application performance bottlenecks. - Provide Tier-3 escalation support for Linux/HPC-related incidents. Collaboration & Documentation - Work closely with research teams, DevOps engineers, and system architects to deliver high-performance solutions. - Document architecture, SOPs, troubleshooting guides, and performance tuning methodologies. Requirements Required Skills & Experience - 8–10 years of hands-on Linux system administration experience in production environments. - 5+ years managing HPC clusters at scale (500+ cores / multiple petabytes of storage). - Strong Ansible automation skills (complex playbooks, roles, variables, templates). - Deep understanding of MPI, OpenMP, and GPU/accelerator integration in HPC workloads. - Proficient with HPC job schedulers (Slurm, PBS Pro, LSF). - Experience with HPC storage (Lustre, BeeGFS, GPFS). - Strong knowledge of TCP/IP networking, Infiniband, and RDMA technologies. - Experience with performance tuning and benchmarking tools (perf, hpc tool kit, Intel VTune, Iperf, fio). - Scripting proficiency in Bash, Python, or Perl for automation and tooling. Preferred Qualifications - Experience with containerized HPC (Singularity, Apptainer, or Podman). - Familiarity with cloud-HPC integration (AWS Parallel Cluster, Azure Cycle Cloud, GCP HPC). - Knowledge of security compliance standards (CIS benchmarks, STIG). - Contribution to HPC community tools or open-source projects. Soft Skills - Strong problem-solving and analytical thinking. - Ability to mentor junior engineers and collaborate across teams. - Excellent communication skills for technical and non-technical stakeholders.
-
Senior HPC Engineer
1 week ago
Faridabad, India Netweb Technologies India Ltd. Full timeJob Title: Senior Engineer-HPC Department: Production & Support Location: FaridabadPosition Summary:Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing,...
-
Senior HPC Engineer
1 week ago
Faridabad, India Netweb Technologies India Ltd. Full timeJob Title: Senior Engineer-HPC Department: Production & Support Location: FaridabadPosition Summary:Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing,...
-
Senior Hpc Engineer
2 weeks ago
Faridabad, India Whatjobs IN C2 Full timeJob Title: Senior Engineer-HPC Department: Production & Support Location: Faridabad Position Summary: Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing,...
-
Senior HPC Engineer
1 week ago
Faridabad, India Netweb Technologies India Ltd. Full timeJob Title: Senior Engineer-HPC Department: Production & Support Location: Faridabad Position Summary: Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing,...
-
Senior HPC Engineer
1 week ago
Faridabad, India Netweb Technologies India Ltd. Full timeJob Title: Senior Engineer-HPC Department: Production & Support Location: Faridabad Position Summary: Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing,...
-
Senior HPC Engineer
1 week ago
Faridabad, Haryana, India, Haryana Netweb Technologies India Ltd. Full timeJob Title: Senior Engineer-HPC Department: Production & Support Location: FaridabadPosition Summary:Accomplished HPC Systems Engineer with 8–10 years of enterprise Linux administration and over 5 years of hands-on experience managing large-scale HPC clusters exceeding 500 cores and multi-petabyte storage environments. Proven expertise in designing,...
-
HPC Infrastructure Head
1 week ago
Faridabad, India L&T Semiconductor Technologies Full timeJob Title: HPC Infrastructure Head – L&T Semiconductor Technologies (LTSCT) Location: Bangalore Experience Level: 15+ years in IT/Unix Infrastructure with 5+ years in HPC leadership Employment Type: Full-time About the Role We are seeking an experienced HPC Infrastructure Head with a strong background in the semiconductor (Chip Design) industry to drive...
-
Testing and Certification Engineer
1 week ago
Faridabad, India Netweb Technologies India Ltd. Full timeJob Summary: Testing & Certification Engineer at Netweb, you will be responsible for helping to drive demand and deliver Projects. Within Netweb Technologies – Product Engineering Group and Innovation Labs is a strategic business supporting the growing market-demand on digital transformation. We are transforming the Storage and Computing space for more...
-
Testing and Certification Engineer
1 week ago
Faridabad, India Netweb Technologies India Ltd. Full timeJob Summary: Testing & Certification Engineer at Netweb, you will be responsible for helping to drive demand and deliver Projects. Within Netweb Technologies – Product Engineering Group and Innovation Labs is a strategic business supporting the growing market-demand on digital transformation. We are transforming the Storage and Computing space for more...
-
Testing and Certification Engineer
1 week ago
Faridabad, India Netweb Technologies India Ltd. Full timeJob Summary: Testing & Certification Engineer at Netweb, you will be responsible for helping to drive demand and deliver Projects. Within Netweb Technologies – Product Engineering Group and Innovation Labs is a strategic business supporting the growing market-demand on digital transformation. We are transforming the Storage and Computing space for more...