Senior Data Center Engineer – AI/ML

18 hours ago


Nadiad, India DC Tech Consulting Full time

Senior Data Center Engineer – AI/ML & GPU Platforms Location: Remote Experience: 7+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration platforms, and GPU infrastructure , with responsibility for building and managing scalable, production-grade GPU compute fabrics . The ideal candidate will have deep hands-on experience in Kubernetes cluster deployment and lifecycle management , virtualization platforms, and GPU hardware management , enabling reliable and high-performance AI workloads across on-prem and hybrid data center environments. Key Responsibilities Design, deploy, and operate GPU-enabled compute infrastructure for AI/ML, HPC, and accelerated workloads. Build and manage Kubernetes clusters at scale , including: Cluster bootstrap, upgrades, and lifecycle management High availability control planes and worker nodes Multi-tenant and multi-cluster environments Implement GPU scheduling, isolation, and sharing within Kubernetes (MIG, device plugins, GPU operators). Deploy and manage virtualization platforms (VMware, KVM, OpenStack, or similar) supporting AI and container workloads. Design and operate compute orchestration platforms spanning VMs, containers, and bare-metal nodes. Integrate GPU servers (NVIDIA A100, H100, L40S, etc.) into Kubernetes and virtualization environments. Automate compute and cluster provisioning using Ansible, Terraform, Helm, and scripting (Bash/Python) . Optimize compute performance, GPU utilization, and resource efficiency across clusters. Manage bare-metal provisioning , OS imaging, and firmware lifecycle for compute nodes. Collaborate with networking and storage teams to deliver a fully integrated AI compute fabric . Implement monitoring, logging, and capacity planning for compute and GPU resources . Maintain detailed documentation for cluster architecture, compute design, and operational runbooks . Required Skills & Qualifications 7+ years of experience in data center compute or platform engineering roles. Strong expertise in Kubernetes deployment and management , including: Production-grade cluster design Upgrades, scaling, and troubleshooting Kubernetes scheduling and resource management Hands-on experience with virtualization platforms such as VMware, KVM, OpenStack, or equivalent. Solid understanding of container runtimes, orchestration, and cloud-native architectures . Experience managing GPU hardware and drivers , including: NVIDIA GPU installation and firmware CUDA, NVIDIA drivers, and GPU operators Proficiency in automation and IaC tools (Ansible, Terraform, Helm). Strong Linux administration skills (RHEL, Ubuntu, CentOS). Experience with performance tuning and capacity planning for compute-intensive workloads. Excellent troubleshooting skills across OS, Kubernetes, virtualization, and GPU layers. Preferred / Good to Have Experience building GPU compute fabrics / GPUaaS platforms . Knowledge of NVIDIA technologies such as MIG, NVLink, NVSwitch, GPUDirect, and CUDA ecosystems. Familiarity with containerized AI/ML frameworks (Kubeflow, Ray, MLFlow). Exposure to bare-metal Kubernetes (RKE2, OpenShift, kubeadm, MAAS). Experience with monitoring and observability tools (Prometheus, Grafana). Understanding of hybrid cloud compute models and on-prem to cloud integrations. Kubernetes certifications (CKA / CKAD) or virtualization certifications are a plus.



  • nadiad, India beBeeSenior Full time

    We are seeking an experienced Senior AI/ML Engineer to join our team.As a key member of the engineering team, you will be responsible for designing and implementing distributed graph computing solutions that process billions of entities and relationships.The successful candidate will lead the development of entity resolution and network generation services,...

  • Senior AI/ML Lead

    2 weeks ago


    nadiad, India beBeeAIlead Full time

    About Our AI/ML Leadership RoleAs a senior software engineer, you will spearhead the design and implementation of advanced AI and machine learning models. Your responsibilities involve guiding a team of engineers to ensure successful deployment of projects leveraging AI/ML technologies to solve complex problems.You will collaborate closely with stakeholders...


  • nadiad, India beBeeData Full time

    Senior AI/ML Cloud Engineer RoleOur company is seeking a Senior AI/ML Cloud Engineer to join our cutting-edge team. This role involves designing and implementing distributed entity resolution algorithms, building blocking strategies, creating AI/ML-based advanced matching with explainable AI (XAI), and implementing incremental resolution supporting real-time...

  • AI/ML Leader

    1 week ago


    nadiad, India beBeeAIExpert Full time

    About the RoleWe are seeking a visionary and skilled AI/ML Expert to lead our team of engineers in designing, developing, and implementing cutting-edge AI models. As a Senior Architect, you will be responsible for guiding the technical direction of AI-related projects, ensuring timely delivery, and driving innovation.Key Responsibilities:Collaborate with...


  • nadiad, India beBeeDeveloper Full time

    Job DescriptionAs a seasoned Salesforce specialist, we are seeking an accomplished AI/ML Developer to craft innovative solutions that harmonize Artificial Intelligence and Machine Learning capabilities within the Salesforce ecosystem.The ideal candidate will architect intelligent workflows, personalized recommendations, and automation by leveraging...

  • AI/ML Expert

    1 week ago


    nadiad, India beBeeArtificialIntelligence Full time

    AI/ML EngineerJob Summary:We are seeking a highly skilled AI/ML Engineer to develop, deploy, and optimize complex AI models for business problems. This role involves working with Generative AI, Retrieval-Augmented Generation (RAG), and Deep Learning techniques.Design and implement end-to-end ML pipelines using vector databases and retrieval...


  • nadiad, India beBeeArtificial Full time

    About our AI/ML RoleWe're building innovative solutions with AI at the core. Our flagship platform enables enterprises to automate workflows, orchestrate data, and enhance customer engagement using NLP-driven capabilities.Our mission is to help organizations unlock measurable ROI and innovation through intelligent automation.AI/ML Engineer Role OverviewThis...


  • Nadiad, India Insight Global Full time

    Agentic & AI Tech Ops EngineerLocation: AI Center of ExcellenceRole Overview:We seek a proactive Agentic & AI Tech Ops Engineer to ensure reliability, scalability, and efficiency of AI and Agentic AI systems in production. You will manage deployments, monitor performance, troubleshoot issues, and implement best practices for Tech Ops/MLOps/LLMOps .Key...


  • nadiad, India beBeeArtificial Full time

    Senior AI ArchitectLead the development of cutting-edge Generative AI and LLM solutions to drive innovation in intelligent automation.Key ResponsibilitiesDefine the technical vision, direction, and best practices for the Generative AI team.Identify and execute high-impact use cases for AI-driven transformation.Design, build, and deploy advanced generative...


  • nadiad, India Larsen & Toubro Full time

    We are seeking a Machine Learning Principal Scientist to join our Electrolyzer Technology team. The ideal candidate will leverage advanced AI/ML techniques to model, predict, and optimize the performance and durability of electrolyzer stacks and materials. This role combines machine learning with electrochemistry and physics to accelerate discovery and...