Senior Data Center Engineer – AI/ML

3 days ago


bangalore, India DC Tech Consulting Full time

Senior Data Center Engineer – AI/ML & GPU PlatformsLocation: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration platforms, and GPU infrastructure, with responsibility for building and managing scalable, production-grade GPU compute fabrics.The ideal candidate will have deep hands-on experience in Kubernetes cluster deployment and lifecycle management, virtualization platforms, and GPU hardware management, enabling reliable and high-performance AI workloads across on-prem and hybrid data center environments.Key ResponsibilitiesDesign, deploy, and operate GPU-enabled compute infrastructure for AI/ML, HPC, and accelerated workloads.Build and manage Kubernetes clusters at scale, including:Cluster bootstrap, upgrades, and lifecycle managementHigh availability control planes and worker nodesMulti-tenant and multi-cluster environmentsImplement GPU scheduling, isolation, and sharing within Kubernetes (MIG, device plugins, GPU operators).Deploy and manage virtualization platforms (VMware, KVM, OpenStack, or similar) supporting AI and container workloads.Design and operate compute orchestration platforms spanning VMs, containers, and bare-metal nodes.Integrate GPU servers (NVIDIA A100, H100, L40S, etc.) into Kubernetes and virtualization environments.Automate compute and cluster provisioning using Ansible, Terraform, Helm, and scripting (Bash/Python).Optimize compute performance, GPU utilization, and resource efficiency across clusters.Manage bare-metal provisioning, OS imaging, and firmware lifecycle for compute nodes.Collaborate with networking and storage teams to deliver a fully integrated AI compute fabric.Implement monitoring, logging, and capacity planning for compute and GPU resources.Maintain detailed documentation for cluster architecture, compute design, and operational runbooks.Required Skills & Qualifications7+ years of experience in data center compute or platform engineering roles.Strong expertise in Kubernetes deployment and management, including:Production-grade cluster designUpgrades, scaling, and troubleshootingKubernetes scheduling and resource managementHands-on experience with virtualization platforms such as VMware, KVM, OpenStack, or equivalent.Solid understanding of container runtimes, orchestration, and cloud-native architectures.Experience managing GPU hardware and drivers, including:NVIDIA GPU installation and firmwareCUDA, NVIDIA drivers, and GPU operatorsProficiency in automation and IaC tools (Ansible, Terraform, Helm).Strong Linux administration skills (RHEL, Ubuntu, CentOS).Experience with performance tuning and capacity planning for compute-intensive workloads.Excellent troubleshooting skills across OS, Kubernetes, virtualization, and GPU layers.Preferred / Good to HaveExperience building GPU compute fabrics / GPUaaS platforms.Knowledge of NVIDIA technologies such as MIG, NVLink, NVSwitch, GPUDirect, and CUDA ecosystems.Familiarity with containerized AI/ML frameworks (Kubeflow, Ray, MLFlow).Exposure to bare-metal Kubernetes (RKE2, OpenShift, kubeadm, MAAS).Experience with monitoring and observability tools (Prometheus, Grafana).Understanding of hybrid cloud compute models and on-prem to cloud integrations.Kubernetes certifications (CKA / CKAD) or virtualization certifications are a plus.


  • AI/ML Engineer

    2 weeks ago


    bangalore, India Data-Hat AI Full time

    Company DescriptionGenerative AI solutions are reshaping how we work, and AI Agents are the future. Data-Hat AI assists Enterprises in navigating the AI landscape and building profitable and scalable Enterprise AI solutions. As transformation leaders implore the AI landscape, they seek experts to assist in developing solutions and building strategies. And...

  • AI/ML Engineer

    5 days ago


    bangalore, India Lingaro Full time

    AI/ML Engineer – Senior ConsultantAI Engineering Group is part of Data Science & AI Competency Center and is focusing technical and engineering aspects of DS/ML/AI solutions. We are looking for experienced AI/ML Engineers to join our team to help us bring AI/ML solutions into production, automate processes, and define reusable best practices and...


  • bangalore, India DC Tech Consulting Full time

    Location: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Linux Administrator with strong Data Center Networking expertise to join our AI/ML infrastructure team. The role focuses on designing, deploying, and operating on-premises Linux and Kubernetes environments optimized for AI/ML and high-performance computing...

  • AI/ML Engineer

    2 weeks ago


    bangalore, India Lingaro Full time

    Role: AI/ML Engineer – Lead AI DevOps Engineer Location: India, Remote About Lingaro: Lingaro Group is the end-to-end data services partner to global brands and enterprises. We lead our clients through their data journey, from strategy through development to operations and adoption, helping them to realize the full value of their data. Since 2008, Lingaro...

  • AI/ML Engineer

    1 week ago


    bangalore, India Lingaro Full time

    Role: AI/ML Engineer – Lead AI DevOps Engineer Location: India, Remote About Lingaro: Lingaro Group is the end-to-end data services partner to global brands and enterprises. We lead our clients through their data journey, from strategy through development to operations and adoption, helping them to realize the full value of their data. Since 2008, Lingaro...


  • bangalore, India Sigmatic Full time

    Senior AI and ML Engineer – Sigmatic.ai Location: Remote / Hybrid (DELHI) Type: Full-Time Company Overview Sigmatic is redefining operational excellence for ambulatory surgery centers (ASCs) through AI-driven automation and intelligent workflow platforms. Our mission is to eliminate manual processes, unlock real-time decision-making, and empower ASCs to...


  • bangalore, India Hodos360.ai Full time

    Job Title: Senior ML Engineer – LLMs Company: Hodos 360 Location: Bangalore (Work From Office – WFO) Shift: 5:00 PM to 2:00 AM IST Experience: 3–8 years About Hodos 360Hodos 360 builds AI-driven products and solutions for global businesses, with a strong focus on Large Language Models (LLMs) and automation. We work closely with clients to understand...


  • bangalore, India Hodos360.ai Full time

    Job Title: Senior ML Engineer – LLMs Company: Hodos 360 Location: Bangalore (Work From Office – WFO) Shift: 5:00 PM to 2:00 AM IST Experience: 3–8 years About Hodos 360Hodos 360 builds AI-driven products and solutions for global businesses, with a strong focus on Large Language Models (LLMs) and automation. We work closely with clients to understand...


  • bangalore, India Hodos360.ai Full time

    Job Title: Senior ML Engineer – LLMsCompany: Hodos 360Location: Bangalore (Work From Office – WFO)Shift: 5:00 PM to 2:00 AM ISTExperience: 3–8 yearsAbout Hodos 360Hodos 360 builds AI-driven products and solutions for global businesses, with a strong focus on Large Language Models (LLMs) and automation. We work closely with clients to understand their...


  • bangalore, India Circuitry.ai Full time

    Job Title: Senior AI/ML Engineer Location: Hyderabad, India – Hybrid Remote (3 days a week onsite) Company Overview: Circuitry.ai is at the forefront of artificial intelligence innovation, specializing in developing AI-driven software solutions that transform how industries operate. By harnessing the power of machine learning, predictive modelling, and...