Senior Data Center Engineer – AI/ML
4 days ago
Senior Data Center Engineer – AI/ML & GPU Platforms Location: Remote Experience: 7+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration platforms, and GPU infrastructure , with responsibility for building and managing scalable, production-grade GPU compute fabrics . The ideal candidate will have deep hands-on experience in Kubernetes cluster deployment and lifecycle management , virtualization platforms, and GPU hardware management , enabling reliable and high-performance AI workloads across on-prem and hybrid data center environments. Key Responsibilities Design, deploy, and operate GPU-enabled compute infrastructure for AI/ML, HPC, and accelerated workloads. Build and manage Kubernetes clusters at scale , including: Cluster bootstrap, upgrades, and lifecycle management High availability control planes and worker nodes Multi-tenant and multi-cluster environments Implement GPU scheduling, isolation, and sharing within Kubernetes (MIG, device plugins, GPU operators). Deploy and manage virtualization platforms (VMware, KVM, OpenStack, or similar) supporting AI and container workloads. Design and operate compute orchestration platforms spanning VMs, containers, and bare-metal nodes. Integrate GPU servers (NVIDIA A100, H100, L40S, etc.) into Kubernetes and virtualization environments. Automate compute and cluster provisioning using Ansible, Terraform, Helm, and scripting (Bash/Python) . Optimize compute performance, GPU utilization, and resource efficiency across clusters. Manage bare-metal provisioning , OS imaging, and firmware lifecycle for compute nodes. Collaborate with networking and storage teams to deliver a fully integrated AI compute fabric . Implement monitoring, logging, and capacity planning for compute and GPU resources . Maintain detailed documentation for cluster architecture, compute design, and operational runbooks . Required Skills & Qualifications 7+ years of experience in data center compute or platform engineering roles. Strong expertise in Kubernetes deployment and management , including: Production-grade cluster design Upgrades, scaling, and troubleshooting Kubernetes scheduling and resource management Hands-on experience with virtualization platforms such as VMware, KVM, OpenStack, or equivalent. Solid understanding of container runtimes, orchestration, and cloud-native architectures . Experience managing GPU hardware and drivers , including: NVIDIA GPU installation and firmware CUDA, NVIDIA drivers, and GPU operators Proficiency in automation and IaC tools (Ansible, Terraform, Helm). Strong Linux administration skills (RHEL, Ubuntu, CentOS). Experience with performance tuning and capacity planning for compute-intensive workloads. Excellent troubleshooting skills across OS, Kubernetes, virtualization, and GPU layers. Preferred / Good to Have Experience building GPU compute fabrics / GPUaaS platforms . Knowledge of NVIDIA technologies such as MIG, NVLink, NVSwitch, GPUDirect, and CUDA ecosystems. Familiarity with containerized AI/ML frameworks (Kubeflow, Ray, MLFlow). Exposure to bare-metal Kubernetes (RKE2, OpenShift, kubeadm, MAAS). Experience with monitoring and observability tools (Prometheus, Grafana). Understanding of hybrid cloud compute models and on-prem to cloud integrations. Kubernetes certifications (CKA / CKAD) or virtualization certifications are a plus.
-
AI/ML Engineer
2 weeks ago
India Data-Hat AI Full timeCompany Description Generative AI solutions are reshaping how we work, and AI Agents are the future. Data-Hat AI assists Enterprises in navigating the AI landscape and building profitable and scalable Enterprise AI solutions. As transformation leaders implore the AI landscape, they seek experts to assist in developing solutions and building strategies. And...
-
Senior Data Center Engineer – AI/ML
5 days ago
India DC Tech Consulting Full timeSenior Data Center Engineer – AI/ML & GPU PlatformsLocation: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration...
-
Senior Data Center Engineer – AI/ML
5 days ago
India DC Tech Consulting Full timeSenior Data Center Engineer – AI/ML & GPU Platforms Location: Remote Experience: 7+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization,...
-
Senior AI/ML Engineer
3 weeks ago
Gurugram, Haryana, India, IN NextDimension AI Full timeCompensation: INR 12-30 LPA Base + Bonus + EquityLocation: GurgaonAbout UsNextDimension is a US-based technology startup building AI Agents in Healthcare, established by a team of distinguished AI/ML Scientists and Engineers from Google, Amazon, and Snowflake. We're empowering Enterprises by building sophisticated, high-impact AI agents that automate sales,...
-
AI/ML Engineer
7 days ago
India Lingaro Full timeAI/ML Engineer – Senior ConsultantAI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...
-
AI/ML Engineer
3 weeks ago
India Lingaro Full timeAI/ML Engineer – Senior ConsultantAI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...
-
Senior ML Engineer
3 weeks ago
Delhi, India Soojh AI Full timeJob Description Company Description Soojh AI is a leading AI services and consulting firm dedicated to helping businesses leverage the power of Artificial Intelligence and Generative AI for scalable efficiency. We offer end-to-end expertise for integrating AI into operations, building AI-driven products, or setting up in-house AI Labs and AI teams. Our track...
-
AI/ML Engineer
3 weeks ago
India Lingaro Full timeAI/ML Engineer – Senior Consultant AI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...
-
AI/ML Engineer
7 days ago
India Lingaro Full timeAI/ML Engineer – Senior Consultant AI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...
-
AI/ML Engineer
6 days ago
India Lingaro Full timeAI/ML Engineer – Senior Consultant AI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...