Lead HPCC

16 hours ago


Bareilly, India Taggd Full time

What you’ll do:


Responsibilities


Leadership and Strategy

Develop HPC and container platform roadmaps and strategies for growth based on business needs

Engage in and enhance the complete service lifecycle—from conceptualization and design to implementation and operation

Identify the growth path and scalability options of a solution and include these in design activities


Solution Design, architecture and Planning


Gather requirements, assess technical feasibility, and design integrated HPC and container solutions that align with business objectives.

Architect and optimize the technical solutions to meet the requirements of the customer.

Identify the potential challenges and constraints that impact the solution and project plan.


Opportunity assessment


Respond to the technical sections of RFIs/RFPs and Lead proof-of-concept engagements to a successful conclusion

Utilize an effective consultative approach to advance opportunities

Innovation and Research

Stay abreast of emerging technologies, trends, and industry developments related to HPC, Kubernetes, containers, cloud computing, and Security.

Develop best practices, Accelerators and Show & Tell for HPC and container platform solutions and integrations.


Customer-centric mindset


Strong focus on understanding customer business requirements and solving complex cloud technology issues

Be the trusted advisor, delight customers, and deliver exceptional customer experiences to drive customer success.

Communicate complex technical concepts and findings to non-technical stakeholders


Team Collaboration

Collaborate with cross-functional teams, including system administrators, developers, data scientist and project managers, to ensure successful project delivery.

Understands the roles and effectively engages other teams and resources within the company

Mentor and train new team members and lead the way in participation in tech talks, forums, innovation.


Performance Optimization and Troubleshooting


Troubleshoot and resolve technical issues related to complete solutions.

Identify performance bottlenecks and provide remediations.


Project Delivery

Ability to lead technical projects by gathering the requirements, preparing the architecture / design and executing it end to end.

Must be able to bring clarity and drive complex projects involving multiple stakeholders

Solid business acumen and ability to converse with client on issues and challenges


Technical Skills


Container Technologies and Orchestration Platform


In-depth knowledge and hands-on experience with containerization technologies like Docker,or Podman

In-depth knowledge and hands-on experience with at least two (2) of the container orchestration technologies like CNCF Kubernetes, Red Hat OpenShift, SUSE Rancher RKE/K3S, Canonical charmed kubernetes or HPE Ezmeral

Runtime


Linux


Knowledge and experience with Linux System Administration, package management, scheduling, boot procedures/troubleshooting, performance optimization, and networking concepts

Good knowledge and hands-on experience with at least two various Linux distributions like RHEL, SLES, Ubuntu, Debian.


HPC

In-depth knowledge and hands-on experience with atleast one (1) HPC technologies, workload schedulers – Slurm, Altair PBS pro, and cluster managers – HPCM, Bright cluster manager

Good experience in performance optimization and health assessment of HPC components such as operating systems, storage, servers, parallel file systems, schedulers.

Good knowledge and hands-on experience containerization technologies like Singularity for HPC

Good knowledge in parallel computing, MPI technologies


Virtualization


Good knowledge and hands-on experience with virtualization technologies like KVM, OpenShift virtualization


Programming Languages


Good experience with Programming ike python,

Good experience withScripting languages like bash


Cloud Platforms


Good knowledge and hands-on experience with OpenStack cloud solutions

Good Knowledge with any of the public cloud container services- AKS, EKS, GKE

Understanding of cloud infrastructure and services for scalable AI deployments.

Good understanding of Cloud Security and Observability

Storage

Indepth knowledge and hands-on experience with CSI drivers

Good knowledge of storage concepts - Block,File and/or Object Storage (like Minio)


Networks


Good knowledge of network protocols like TCP/IP, S3, FTP, NFS, or SMB/CIFS

Good knowledge of DNS, TCP/IP, Routing and Load Balancing


Networks - HPC


Good knowledge of HPC networking stack (high speed networking), InfiniBand.

GPU

Knowledge of GPU technologies, NVIDIA GPU operator, NVIDIA vGPU technology


What you need to bring:

Qualifications:

  • Bachelor’s/master’s degree in computer science, Information Technology, or a related field
  • Proven experience as a Solutions Architect, HPC and Container platform Specialist, or similar role, with expertise in designing and implementing complex solutions
  • Red Hat Certified Specialist in Containers and Kubernetes (RHCSA, RHCE), CNCF certification - CKA, CKAD, CKS is preferred
  • Typically, 6-8 years of experience in delivering complex HPC and container platform projects
  • Excellent communication and presentation skills with the ability to convey complex technical concepts to non-technical stakeholders.