Senior data center engineer – ai/ml
2 days ago
Senior Data Center Engineer – AI/ML & GPU Platforms Location: Remote Experience: 7+ Years Type: Full-time Role Overview We are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration platforms, and GPU infrastructure , with responsibility for building and managing scalable, production-grade GPU compute fabrics . The ideal candidate will have deep hands-on experience in Kubernetes cluster deployment and lifecycle management , virtualization platforms, and GPU hardware management , enabling reliable and high-performance AI workloads across on-prem and hybrid data center environments. Key Responsibilities Design, deploy, and operate GPU-enabled compute infrastructure for AI/ML, HPC, and accelerated workloads. Build and manage Kubernetes clusters at scale , including: Cluster bootstrap, upgrades, and lifecycle management High availability control planes and worker nodes Multi-tenant and multi-cluster environments Implement GPU scheduling, isolation, and sharing within Kubernetes (MIG, device plugins, GPU operators). Deploy and manage virtualization platforms (VMware, KVM, Open Stack, or similar) supporting AI and container workloads. Design and operate compute orchestration platforms spanning VMs, containers, and bare-metal nodes. Integrate GPU servers (NVIDIA A100, H100, L40 S, etc.) into Kubernetes and virtualization environments. Automate compute and cluster provisioning using Ansible, Terraform, Helm, and scripting (Bash/Python) . Optimize compute performance, GPU utilization, and resource efficiency across clusters. Manage bare-metal provisioning , OS imaging, and firmware lifecycle for compute nodes. Collaborate with networking and storage teams to deliver a fully integrated AI compute fabric . Implement monitoring, logging, and capacity planning for compute and GPU resources . Maintain detailed documentation for cluster architecture, compute design, and operational runbooks . Required Skills & Qualifications 7+ years of experience in data center compute or platform engineering roles. Strong expertise in Kubernetes deployment and management , including: Production-grade cluster design Upgrades, scaling, and troubleshooting Kubernetes scheduling and resource management Hands-on experience with virtualization platforms such as VMware, KVM, Open Stack, or equivalent. Solid understanding of container runtimes, orchestration, and cloud-native architectures . Experience managing GPU hardware and drivers , including: NVIDIA GPU installation and firmware CUDA, NVIDIA drivers, and GPU operators Proficiency in automation and Ia C tools (Ansible, Terraform, Helm). Strong Linux administration skills (RHEL, Ubuntu, Cent OS). Experience with performance tuning and capacity planning for compute-intensive workloads. Excellent troubleshooting skills across OS, Kubernetes, virtualization, and GPU layers. Preferred / Good to Have Experience building GPU compute fabrics / GPUaa S platforms . Knowledge of NVIDIA technologies such as MIG, NVLink, NVSwitch, GPUDirect, and CUDA ecosystems. Familiarity with containerized AI/ML frameworks (Kubeflow, Ray, MLFlow). Exposure to bare-metal Kubernetes (RKE2, Open Shift, kubeadm, MAAS). Experience with monitoring and observability tools (Prometheus, Grafana). Understanding of hybrid cloud compute models and on-prem to cloud integrations. Kubernetes certifications (CKA / CKAD) or virtualization certifications are a plus.
-
Senior AI/ML Solutions Architect
2 weeks ago
guntur, India beBeeEngineer Full timeThe AI/ML Engineer will lead our data science team in leveraging AI and machine learning technologies to extract insights and build predictive models.About the role:This senior member of our Data Science & AI Competency Center will guide delivery, coordinate workstreams and drive architecture and deployment of AI/ML solutions.Responsibilities include...
-
Senior AI Engineer
2 weeks ago
guntur, India MindBrain Full timeKey Responsibilities• Design, develop, and implement AI/ML algorithms and integrate them into software applications.• Build robust and scalable APIs and services that support AI functionality.• Collaborate with data scientists, backend/frontend engineers, and product managers to define requirements anddeliver AI-driven features.• Train, test, and...
-
AI/ML Intern
2 weeks ago
Guntur, India TalentXM (Formerly BlockTXM Inc) Full timeTalentXM, powered by BlockTXM Inc., is a next-generation AI-driven talent orchestration platform. We are redefining how talent and opportunities connect by combining AI innovation, orchestration, and future-of-work intelligence. As part of our journey, we are building the next wave of AI-powered products and workflows that will shape how organizations and...
-
Data-Driven AI/ML Platform Developer
1 week ago
guntur, India beBeeMachineLearning Full timeMachine Learning EngineerWe're seeking a highly skilled Machine Learning Observability Platform Engineer to lead our AI/ML platform development.The ideal candidate will design, build, and maintain AI/ML features for an open-source Observability Platform built on Grafana and ClickHouse.This includes collaborating with SREs, service owners, and observability...
-
ML/AI Lead
2 weeks ago
Guntur, Andhra Pradesh, India maawaabro it solutions pvt ltd Full timeJob Description – Go EngineerImmediate joining.Employment Type: Full-timeProject: OTRAS – Next-Gen AI-based Government Exam & Recruitment PlatformML/AI Lead (Next-Gen AI for OTRAS)Role: ML/AI LeadExperience: 7–12 YearsSalary: ₹upto – ₹2,50,000 per monthAbout the RoleOTRAS is building India's first next-gen AI-powered examination ecosystem,...
-
Senior AI Specialist
7 days ago
guntur, India beBeeGenerative Full timeJob Opportunity for a Senior AI SpecialistWe are seeking an experienced Artificial Intelligence Engineer to lead our team in developing cutting-edge Generative AI applications.The ideal candidate will have a strong background in software development, with at least 3 years of experience in AI/ML systems engineering or Generative AI. They should be proficient...
-
guntur, India beBeeAIEngineer Full timeWe are seeking an experienced AI/ML Professional to develop a sports companion chatbot for cricket and football applications. This innovative project aims to deliver actionable insights and solutions using machine learning algorithms and generative AI models.The ideal candidate will have extensive experience in building and optimizing ML models, fine-tuning...
-
AI/ML Model Developer
1 week ago
guntur, India beBeeMachineLearning Full timeKey Role: AI/ML Model DeveloperJob Description:The primary responsibility of this position involves the development, deployment, and maintenance of Artificial Intelligence (AI) and Machine Learning (ML) models utilizing Python and Django frameworks.Core Responsibilities:Develop robust data pipelines and APIs that seamlessly integrate ML models into...
-
AI/ML Developer Specialist
2 weeks ago
guntur, India beBeeExpertise Full timeJob Title: AI/ML Developer ExpertiseAbout This RoleWe are seeking a seasoned professional with expertise in designing and implementing advanced Salesforce solutions integrated with Artificial Intelligence and Machine Learning capabilities.The ideal candidate will develop intelligent workflows, personalized recommendations, and automation by leveraging...
-
ML Engineer – Google Cloud
2 weeks ago
Guntur, India Mastech Digital Full timeJob Title: ML Engineer – Google Cloud Location: Chennai/Bangalore Budget: Open to discuss Notice Period: Immediate joiner/ Serving notice with less than 60 days/Notice is less than 60 days We are looking for an ML Engineer with expertise in building end-to-end machine learning workflows on Google Cloud. You will develop, train, and deploy ML models at...