ML Platform Engineer
5 days ago
At eBay, we're more than a global ecommerce leader — we're changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We're committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.
Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.
Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.
At eBay, we are building the next-generation AI platform to power intelligent experiences for millions of users worldwide. Our AI Platform (AIP) provides the scalable, secure, and efficient foundation for deploying and optimizing advanced machine learning and large language model (LLM) workloads at production scale. We enable teams across eBay to move from experimentation to global deployment with speed, reliability, and efficiency.
We are seeking an experienced Machine Learning Platform Support Engineer to join our AI Platform team. In this role, you will be the first line of support (L1) for ML workloads running on Kubernetes and clusters. You will be responsible for triaging, monitoring, and resolving platform-related issues across ML training, inference, model deployment, and GPU resource allocation.
This position includes participation in on-call rotations (PagerDuty) and requires close collaboration with ML Platform engineers, researchers, and platform teams to ensure the reliability, scalability, and usability of the AI Platform. You will play a critical role in ensuring operational excellence and maintaining the uptime of the core infrastructure that powers eBay's global AI and ML systems.
What you will accomplish- Serve as the first point of contact (L1) for all support requests related to the AI/ML Platform, including ML training, inference, model deployment, and GPU allocation.
- Provide operational and on-call (PagerDuty) support for and Kubernetes clusters running distributed ML workloads across cloud and on-prem environments.
- Monitor, triage, and resolve platform incidents involving job failures, scaling errors, cluster instability, or GPU resource contention.
- Manage GPU quota allocation and scheduling across multiple user teams, ensuring compliance with approved quotas and optimal resource utilization.
- Support Ray Train/Tune for large-scale distributed training and Ray Serve for autoscaled inference, maintaining performance and service reliability.
- Troubleshoot Kubernetes workloads, including pod scheduling, networking, image issues, and resource exhaustion in multi-tenant namespaces.
- Collaborate with platform engineers, SREs, and ML practitioners to resolve infrastructure, orchestration, and dependency issues impacting ML workloads.
- Improve observability, monitoring, and alerting for Ray and Kubernetes clusters using Prometheus, Grafana, and OpenTelemetry to enable proactive issue detection.
- Maintain and enhance runbooks, automation scripts, and knowledge base documentation to accelerate incident resolution and reduce recurring support requests.
- Participate in root cause analysis (RCA) and post-incident reviews, contributing to platform improvements and automation initiatives to minimize downtime.
- Bachelor's or Master's degree in Computer Science, Engineering, or related technical discipline (or equivalent experience).
- 5+ years of experience in ML operations, DevOps, or platform support for distributed AI/ML systems.
- Proven experience providing L1/L2 and on-call support for and Kubernetes-based clusters supporting ML training and inference workloads.
- Strong understanding of Ray cluster operations, including autoscaling, job scheduling, and workload orchestration across heterogeneous compute (CPU/GPU/accelerators).
- Hands-on experience managing Kubernetes control plane and data plane components, multi-tenant namespaces, RBAC, ingress, and resource isolation.
- Expertise in GPU scheduling, allocation, and monitoring (NVIDIA device plugin, MIG configuration, CUDA/NCCL optimization).
- Proficiency in Python and/or Go for automation, diagnostics, and operational tooling in distributed environments.
Working knowledge of Kubernetes and cloud-native environments (AWS, GCP, Azure) and CI/CD pipelines. - Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and incident management tools (PagerDuty, ServiceNow).
- Familiarity with ML frameworks such as TensorFlow and PyTorch, and their integration within distributed Ray/Kubernetes clusters.
- Strong debugging, analytical, and communication skills to collaborate effectively with cross-functional engineering and research teams.
- A customer-centric, operationally disciplined mindset focused on maintaining platform reliability, performance, and user satisfaction.
Please see the Talent Privacy Notice for information regarding how eBay handles your personal data collected when you use the eBay Careers website or apply for a job with eBay.
eBay is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you have a need that requires accommodation, please contact us at We will make every effort to respond to your request for accommodation as soon as possible. View our accessibility statement to learn more about eBay's commitment to ensuring digital accessibility for people with disabilities.
The eBay Jobs website uses cookies to enhance your experience. By continuing to browse the site, you agree to our use of cookies. Visit our Privacy Center for more information.
-
Associate Architect- ML Platform
5 days ago
Bengaluru, Karnataka, India Myntra Full time ₹ 20,00,000 - ₹ 40,00,000 per yearThe OpportunityAs an Associate Architect in the Machine Learning Platform team, you will be a key technical leader in the Machine Learning Engineering team. You will be responsible for the strategic vision, design, and implementation of Myntras end-to-end ML platform. This is a high-impact role for a seasoned ML practitioner who has transitioned into an...
-
Platform Engineer
7 days ago
Bengaluru, Karnataka, India Programming Full time ₹ 9,00,000 - ₹ 12,00,000 per year**NOTE: Only immediate joiners needed**We're Hiring: Associate Platform Engineer (Remote)Role:Associate Platform EngineerLocation:RemoteExperience:5 to 8 yearsBudget:15 LPA (Flexible)Key Skills & Technologies:Containerization & Orchestration: Docker, KubernetesInfrastructure as Code: TerraformConfiguration Management: Ansible / Chef / PuppetCloud & ML...
-
Senior Software Engineer, Platform
3 days ago
Bengaluru, Karnataka, India vaga para Senior Software Engineer, Platform na Headout Full time ₹ 12,00,000 - ₹ 24,00,000 per yearWhy Headout?We're a rocketship: 9-figure revenue, record growth, and profitableWith $130M in revenue, guests in 100+ cities, and 18 months of profitability, Headout is the fastest-growing marketplace in the travel industry, and we're just getting started. We've raised $60M+ from top-tier investors and are building a durable company for the long term —...
-
Platform Engineer
1 week ago
Bengaluru, Karnataka, India Equinix Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob SummaryWere looking for a Senior Platform Engineer with a strong foundation in data architecture, distributed systems, and modern cloud-native platforms to architect, build, and maintain intelligent infrastructure and systems that power our AI, GenAI and data-intensive workloads.Youll work closely with cross-functional teams, including data scientists,...
-
- Lead ML engineer
1 week ago
Bengaluru, Karnataka, India Nexthire Full time ₹ 12,00,000 - ₹ 36,00,000 per yearLocation: Bengaluru, IndiaCompany: Experience: 5+ YearsEmployment Type: Full-TimeAbout is a cutting-edge AI and data-driven solutions company focused on building intelligent platforms for enterprises. We specialize in Machine Learning, Generative AI, MLOps, Computer Vision, and large-scale data engineering. Our mission is to help organizations unlock the...
-
AI/ML Engineer
1 day ago
Bengaluru, Karnataka, India LeadSquared Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob SummaryLeadsquared is seeking a passionate and driven Machine Learning Engineer with 1-2 years of overall experience to join our innovative team in Bengaluru.This role will focus on the development, deployment, and maintenance of machine learning models. While the primary focus is on ML, the ideal candidate will have a foundational understanding of data...
-
ML Ops Engineer
2 days ago
Bengaluru, Karnataka, India Agiliad Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title: ML Ops EngineerLocation: BangaloreExperience: 38 YearsEmployment Type: Full-timeAbout the RoleWe are looking for a skilled ML Ops Engineer to design, build, and optimize scalable machine learning (ML) infrastructure and pipelines. You will collaborate with data scientists, ML engineers, and DevOps teams to operationalize ML models for production...
-
ML Engineer
5 days ago
Bengaluru, Karnataka, India Weekday AI Full time ₹ 20,00,000 - ₹ 25,00,000 per yearThis role is for one of Weekday's clientsJobType: full-timeWe're looking for a Machine Learning Engineer to join as an early team member and help shape the foundation of our AI-powered platform. You'll work directly with the founding team — serial entrepreneurs with a track record of building and exiting successful startups — to design, build, and scale...
-
AI/ML Engineer
1 week ago
Bengaluru, Karnataka, India Growel Softech Pvt Ltd Full time ₹ 8,00,000 - ₹ 24,00,000 per yearLocation : BangaloreExp : 7 to 12 yearsSkill :AI/ML development, GenAI (Mandatory), Deep learning, LLM and NLP, SQL / NoSQL, Python, TensorFlow, PyTorch, or other AI/ML frameworks, Experience with cloud platforms and tools such as AWS, Azure, or Google Cloud for AI development with specific knowledge on EC2, ASG, S3, RDSRole: AI/ML EngineerRequirements /...
-
Principal Engineer, AI/ML
1 day ago
Bengaluru, Karnataka, India Nike Full time ₹ 1,50,000 - ₹ 2,50,000 per yearWho You'll Work WithYou'll be joining a dynamic, fast-paced Global FPE (Foundational Platforms Engineering) team within Nike. Our mission is to build and scale world-class cloud-native platforms, enabling Nike's data-driven decision-making and intelligent automation capabilities.This role sits right into AI-driven innovation helping to drive cutting-edge...