Senior Data Center Engineer – AI/ML

3 weeks ago


India DC Tech Consulting Full time

Senior Data Center Engineer – AI/ML & GPU Platforms Location: Remote Experience: 7+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration platforms, and GPU infrastructure, with responsibility for building and managing scalable, production-grade GPU compute fabrics. The ideal candidate will have deep hands-on experience in Kubernetes cluster deployment and lifecycle management, virtualization platforms, and GPU hardware management, enabling reliable and high-performance AI workloads across on-prem and hybrid data center environments. Key Responsibilities - Design, deploy, and operate GPU-enabled compute infrastructure for AI/ML, HPC, and accelerated workloads. - Build and manage Kubernetes clusters at scale, including: - Cluster bootstrap, upgrades, and lifecycle management - High availability control planes and worker nodes - Multi-tenant and multi-cluster environments - Implement GPU scheduling, isolation, and sharing within Kubernetes (MIG, device plugins, GPU operators). - Deploy and manage virtualization platforms (VMware, KVM, OpenStack, or similar) supporting AI and container workloads. - Design and operate compute orchestration platforms spanning VMs, containers, and bare-metal nodes. - Integrate GPU servers (NVIDIA A100, H100, L40S, etc.) into Kubernetes and virtualization environments. - Automate compute and cluster provisioning using Ansible, Terraform, Helm, and scripting (Bash/Python). - Optimize compute performance, GPU utilization, and resource efficiency across clusters. - Manage bare-metal provisioning, OS imaging, and firmware lifecycle for compute nodes. - Collaborate with networking and storage teams to deliver a fully integrated AI compute fabric. - Implement monitoring, logging, and capacity planning for compute and GPU resources. - Maintain detailed documentation for cluster architecture, compute design, and operational runbooks. Required Skills & Qualifications - 7+ years of experience in data center compute or platform engineering roles. - Strong expertise in Kubernetes deployment and management, including: - Production-grade cluster design - Upgrades, scaling, and troubleshooting - Kubernetes scheduling and resource management - Hands-on experience with virtualization platforms such as VMware, KVM, OpenStack, or equivalent. - Solid understanding of container runtimes, orchestration, and cloud-native architectures. - Experience managing GPU hardware and drivers, including: - NVIDIA GPU installation and firmware - CUDA, NVIDIA drivers, and GPU operators - Proficiency in automation and IaC tools (Ansible, Terraform, Helm). - Strong Linux administration skills (RHEL, Ubuntu, CentOS). - Experience with performance tuning and capacity planning for compute-intensive workloads. - Excellent troubleshooting skills across OS, Kubernetes, virtualization, and GPU layers. Preferred / Good to Have - Experience building GPU compute fabrics / GPUaaS platforms. - Knowledge of NVIDIA technologies such as MIG, NVLink, NVSwitch, GPUDirect, and CUDA ecosystems. - Familiarity with containerized AI/ML frameworks (Kubeflow, Ray, MLFlow). - Exposure to bare-metal Kubernetes (RKE2, OpenShift, kubeadm, MAAS). - Experience with monitoring and observability tools (Prometheus, Grafana). - Understanding of hybrid cloud compute models and on-prem to cloud integrations. - Kubernetes certifications (CKA / CKAD) or virtualization certifications are a plus.


  • AI/ML Engineer

    4 weeks ago


    India Data-Hat AI Full time

    Company Description Generative AI solutions are reshaping how we work, and AI Agents are the future. Data-Hat AI assists Enterprises in navigating the AI landscape and building profitable and scalable Enterprise AI solutions. As transformation leaders implore the AI landscape, they seek experts to assist in developing solutions and building strategies. And...


  • India DC Tech Consulting Full time

    Senior Data Center Engineer – AI/ML & GPU PlatformsLocation: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration...

  • AI/ML Engineer

    3 weeks ago


    India Lingaro Full time

    AI/ML Engineer – Senior ConsultantAI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...

  • AI/ML Engineer

    3 weeks ago


    India Lingaro Full time

    AI/ML Engineer – Senior Consultant AI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...

  • AI/ML Engineer

    1 week ago


    India Lingaro Full time

    AI/ML Engineer – Senior Consultant AI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...

  • AI/ML Engineer

    3 weeks ago


    India Lingaro Full time

    AI/ML Engineer – Senior Consultant AI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...


  • India DC Tech Consulting Full time

    Location: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Linux Administrator with strong Data Center Networking expertise to join our AI/ML infrastructure team. The role focuses on designing, deploying, and operating on-premises Linux and Kubernetes environments optimized for AI/ML and high-performance computing...


  • India DC Tech Consulting Full time

    Location: Remote Experience: 7+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Linux Administrator with strong Data Center Networking expertise to join our AI/ML infrastructure team. The role focuses on designing, deploying, and operating on-premises Linux and Kubernetes environments optimized for AI/ML and high-performance...


  • india, IN DC Tech Consulting Full time

    Senior Data Center Engineer – AI/ML & GPU PlatformsLocation: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration...


  • India Bharat Minds AI Full time

    Job Description Internship Duration: 6 Month First 3 Months : Unpaid Internship (Performance Evaluation) From 3rd or 4th Month Onwards : Paid Internship (10,000 to 25,000) Based on Performance and role. Qualifications : - Strong command over Data Cleaning and Exploratory Data Analysis (EDA) - Strong Understadning of AI/ML concepts and workflows - Hands-on...