Senior Data Center Engineer – AI/ML

19 hours ago


india, IN DC Tech Consulting Full time
Senior Data Center Engineer – AI/ML & GPU Platforms

Location: Remote

Experience: 7+ Years

Type: Full-time

Role Overview

We are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration platforms, and GPU infrastructure, with responsibility for building and managing scalable, production-grade GPU compute fabrics.

The ideal candidate will have deep hands-on experience in Kubernetes cluster deployment and lifecycle management, virtualization platforms, and GPU hardware management, enabling reliable and high-performance AI workloads across on-prem and hybrid data center environments.

Key Responsibilities
  • Design, deploy, and operate GPU-enabled compute infrastructure for AI/ML, HPC, and accelerated workloads.
  • Build and manage Kubernetes clusters at scale, including:
  • Cluster bootstrap, upgrades, and lifecycle management
  • High availability control planes and worker nodes
  • Multi-tenant and multi-cluster environments
  • Implement GPU scheduling, isolation, and sharing within Kubernetes (MIG, device plugins, GPU operators).
  • Deploy and manage virtualization platforms (VMware, KVM, OpenStack, or similar) supporting AI and container workloads.
  • Design and operate compute orchestration platforms spanning VMs, containers, and bare-metal nodes.
  • Integrate GPU servers (NVIDIA A100, H100, L40S, etc.) into Kubernetes and virtualization environments.
  • Automate compute and cluster provisioning using Ansible, Terraform, Helm, and scripting (Bash/Python).
  • Optimize compute performance, GPU utilization, and resource efficiency across clusters.
  • Manage bare-metal provisioning, OS imaging, and firmware lifecycle for compute nodes.
  • Collaborate with networking and storage teams to deliver a fully integrated AI compute fabric.
  • Implement monitoring, logging, and capacity planning for compute and GPU resources.
  • Maintain detailed documentation for cluster architecture, compute design, and operational runbooks.
Required Skills & Qualifications
  • 7+ years of experience in data center compute or platform engineering roles.
  • Strong expertise in Kubernetes deployment and management, including:
  • Production-grade cluster design
  • Upgrades, scaling, and troubleshooting
  • Kubernetes scheduling and resource management
  • Hands-on experience with virtualization platforms such as VMware, KVM, OpenStack, or equivalent.
  • Solid understanding of container runtimes, orchestration, and cloud-native architectures.
  • Experience managing GPU hardware and drivers, including:
  • NVIDIA GPU installation and firmware
  • CUDA, NVIDIA drivers, and GPU operators
  • Proficiency in automation and IaC tools (Ansible, Terraform, Helm).
  • Strong Linux administration skills (RHEL, Ubuntu, CentOS).
  • Experience with performance tuning and capacity planning for compute-intensive workloads.
  • Excellent troubleshooting skills across OS, Kubernetes, virtualization, and GPU layers.
Preferred / Good to Have
  • Experience building GPU compute fabrics / GPUaaS platforms.
  • Knowledge of NVIDIA technologies such as MIG, NVLink, NVSwitch, GPUDirect, and CUDA ecosystems.
  • Familiarity with containerized AI/ML frameworks (Kubeflow, Ray, MLFlow).
  • Exposure to bare-metal Kubernetes (RKE2, OpenShift, kubeadm, MAAS).
  • Experience with monitoring and observability tools (Prometheus, Grafana).
  • Understanding of hybrid cloud compute models and on-prem to cloud integrations.
  • Kubernetes certifications (CKA / CKAD) or virtualization certifications are a plus.



  • Senior AI/ML Engineer

    3 weeks ago


    Gurugram, Haryana, India, IN NextDimension AI Full time

    Compensation: INR 12-30 LPA Base + Bonus + EquityLocation: GurgaonAbout UsNextDimension is a US-based technology startup building AI Agents in Healthcare, established by a team of distinguished AI/ML Scientists and Engineers from Google, Amazon, and Snowflake. We're empowering Enterprises by building sophisticated, high-impact AI agents that automate sales,...


  • Gurugram, Haryana, India, IN NextDimension AI Full time

    Compensation: INR 12-30 LPA Base + Bonus + EquityLocation: GurgaonNextDimension is a US-based technology startup building AI Agents in Healthcare, established by a team of distinguished AI/ML Scientists and Engineers from Google, Amazon, and Snowflake. We're empowering Enterprises by building sophisticated, high-impact AI agents that automate sales,...

  • Lead AI/ML Engineer

    3 weeks ago


    india, IN Simelabs - Digital, AIML, Automation, Robotics, Gen AI. Full time

    About the RoleArchitect scalable ML pipelines, services, and platforms using modern cloud and MLOps practices.ResponsibilitiesBuild, fine-tune, and integrate Generative AI models (LLMs, Vision Models, Multimodal Models) into business applications.Work with agentic AI frameworks to design autonomous and semi-autonomous AI agents.Collaborate with...

  • AI and ML Manager

    3 weeks ago


    Gurugram, Haryana, India, IN Advanced AI research and product company Full time

    About the Company: Our client is an advanced AI research and product company focused on building intelligent systems that combine deep reasoning, natural language understanding, and adaptive learning. Its mission is to develop technologies that can seamlessly assist individuals and enterprises in decision-making, creativity, and automation.The company...

  • AI/ML Engineer

    2 weeks ago


    india, IN Edstem Technologies Full time

    We are seeking an AI/ML Engineer with over 5 years of expertise in conventional machine learning and experience or interest in generative AI to develop Sports Companion GPT - Aiko for sports applications, specifically targeting Cricket and Football. The ideal candidate will have extensive experience building and optimizing ML models and working with large...

  • Senior AI ML Engineer

    2 weeks ago


    india, IN Balancehero India Full time

    About BalanceheroBalancehero India Pvt. Ltd. (BHI), the wholly-owned subsidiary of Balancehero Co. Ltd., Korea which runs and operates the mobile app “True Balance”- a one-stop destination for financial services.” Founded by Charlie Lee in Korea in 2014, Balancehero started its operations in India in the year 2016. It started off as a balance check...

  • Associate Director

    3 weeks ago


    Gurugram, Haryana, India, IN Sirius AI Full time

    Role OverviewWe are seeking a dynamic and visionary Associate Director to lead solutioning and innovation initiatives within our AI Innovations Lab. This role involves designing, delivering, and scaling AI/ML solutions for clients in the financial services ecosystem. The ideal candidate brings a mix of hands-on technical expertise, strategic thinking, and...

  • AI/ML Engineer

    3 weeks ago


    india, IN Innodata Inc. Full time

    About UsInnodata is a global leader in digital transformation, Our AI-driven platforms and expert teams empower clients in healthcare, life insurance, and other industries to identify risks, improve efficiency, and make smarter decisions. By combining proprietary technology with deep domain expertise, we help businesses unlock the full potential of their...

  • Software Engineer

    3 weeks ago


    india, IN Mindfire Solutions Full time

    About the JobAs an AI/ML Engineer, you will be responsible for designing, validating, and integrating cutting-edge machine learning models and algorithms. Collaborate closely with cross-functional teams, including data scientists, to recognize and establish project objectives. Oversee data infrastructure maintenance, ensuring streamlined and scalable data...


  • Gurugram, Haryana, India, IN Sirius AI Full time

    Key ResponsibilitiesEngage with clients to understand their business objectives and challenges, providing data-driven recommendations and AI/ML solutions that enhance decision-making and deliver tangible value.Translate business needs - particularly within financial services domains such as marketing, risk, compliance and customer lifecycle management into...