HPC Admin

2 days ago


Vijayawada, Andhra Pradesh, India Metasys Technologies Full time

HPC Admin

Full Time

Hyderabad (REMOTE)

Responsibilities:


• Administration of HPC and VDI clusters


• User Account management for HPC onboarding and offboarding


• Creation and Maintenance of AMI Images in AMI accounts


• Install, configure, and maintain Linux operating systems on HPC clusters.


• Support HPC necessary components and native services of the platform by coordinating with respective providerse.g., EFPortal, AWS RES, CycleCloud, AWS Parallel Cluster, etc.,


• AWS Managed Active Directory support and Management


• Continuous upgrades to the HPC platform and related components - OS, Java, Python, EFPortal, etc.


• Implement and maintain necessary compliance controls i.e., US Export Control, Confidentiality. Conduct regular audits, share the findings and implement corrective actions as required.


• Co-ordinate with other teams like v-drive team in testing and migrating/installing engineering applications to the platform.


• Manage job schedulers such as Slurm or LSF.


• Utilize node provisioning tools like Werewolf.


• Troubleshoot system issues and provide technical support to users.


• Monitor system performance and ensure optimal operation of the HPC environment.


• Collaborate with other IT professionals to integrate new technologies into the existing infrastructure.


• Progressive experience in HPC system administration, preferably in a Redhat/CentOS Linux environment.


• AWS Cloud formation templates to build infrastructure for HPC and storage Amazon FSx for Netapp and Lustre.



  • Vijayawada, Andhra Pradesh, India beBeeLinux Full time ₹ 1,20,00,000 - ₹ 2,00,00,000

    Job Description:We are seeking a Senior Linux Engineer to join our global technology infrastructure team. This is not a traditional admin or support role – you will be a core enabler of quantitative research and high-performance computing (HPC) within a large-scale financial research environment.The successful candidate will design, scale, and automate...