Multimodal Model Expert

2 days ago

Vapi, Gujarat, India beBeeVlm Full time ₹ 1,20,00,000 - ₹ 2,00,00,000

Job Title: VLM Research Engineer">
">Location: Vapi, Gujarat">
">Employment Type: Full-Time">
">Overview">">

We are seeking a highly skilled expert in multimodal (vision-language-action) models to build instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.

">
">Key Responsibilities">">

">
Pretrain and finetune VLMs, aligning them with robotics data including video, teleoperation, and language.
">
Build perception-to-language grounding for referring expressions, affordances, and task graphs.
">
Develop Toolformer/actuator interfaces to convert language intents into actionable skills and motion plans.
">
Create evaluation pipelines for instruction following, safety filters, and hallucination control.
">
Collaborate with cross-functional teams for integration of models into robotics platforms.
">

">
">Must-Haves">">

">
Master's or PhD in a relevant field.
">
1–2+ years of experience in Computer Vision/Machine Learning.
">
Strong proficiency in PyTorch or JAX; experience with LLMs and VLMs.
">
Familiarity with multimodal datasets, distributed training, and RL/IL.
">

">
">Nice-to-Haves">">

">
Experience with world models, diffusion-policy integration, and speech interfaces.
">
Familiarity with sim-to-real transfer in robotics applications.
">

">
">Success Metrics">">

">
Success@k on language-based tasks.
">
Grounding precision and latency.
">
Sim-to-real performance retention.
">

">
">Domain Notes">">Humanoids:">">

Language-guided manipulation and tool use.

">AGVs (Autonomous Ground Vehicles):">">

Natural language tasking for warehouse operations; semantic maps.

">Cars:">">

Gestures and sign interpretation; driver interaction.

">Drones:">">

Natural language mission specification; target search and inspection.

">
">Application Instructions">">

Interested candidates may apply by sending their resume and cover letter.

Multimodal AI Software Modeler

2 days ago

Vapi, Gujarat, India beBeeModeler Full time ₹ 40,00,000 - ₹ 50,00,000

Software Modeler - Multimodal AI ExpertWe seek a skilled model developer to create cutting-edge models for task-oriented dialogue systems, vision-language understanding, and multimodal perception.Main ResponsibilitiesPretrain and fine-tune visual language models (VLMs) aligning them with robotics data including video, teleoperation, and language.Build...
Multimodal Model Developer

7 days ago

Vapi, Gujarat, India beBeeMachineLearning Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

Research EngineerWe are seeking a highly skilled Research Engineer to build multimodal models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.Requirements- 12+ years of experience in Computer Vision/Machine...
Multimodal Modeling Specialist

2 days ago

Vapi, Gujarat, India beBeeMachineLearning Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

Job Title: Vision-Led Model Research Engineer Job Description:We are seeking a highly skilled Research Engineer to build multimodal models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.Key...
Researcher - Multimodal Model Development

1 day ago

Vapi, Gujarat, India beBeeMultimodal Full time US$ 1,20,000 - US$ 1,50,000

Job Summary:We are seeking a highly skilled researcher to build multimodal models for instruction following, scene grounding, and tool use across platforms.The role involves developing advanced models that bridge perception and language understanding for autonomous systems.About the RoleDevelop vision-language models (VLMs) aligning them with robotics data...
Multimodal Vision-Language Expert

4 days ago

Vapi, Gujarat, India beBeeVisionLanguageModelEngineer Full time ₹ 9,00,000 - ₹ 12,00,000

Job Title: Vision-Language Model EngineerAbout the Role:We are seeking a highly skilled Vision-Language Model (VLM) engineer to develop multimodal models for instruction following, scene grounding, and tool use across various platforms. The role involves designing advanced models that bridge perception and language understanding for autonomous systems.Key...
Multimodal AI Researcher

4 days ago

Vapi, Gujarat, India beBeeArtificialintelligence Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

We are seeking a skilled Multimodal AI Researcher to develop and implement multimodal models for instruction following, scene grounding, and tool use across various platforms.Key Responsibilities:Pretrain and fine-tune VLMs aligning them with robotics data including video, teleoperation, and language.Build perception-to-language grounding for referring...
AI Multimodal Systems Specialist

4 days ago

Vapi, Gujarat, India beBeeVisionLanguage Full time ₹ 12,00,000 - ₹ 20,00,000

Advanced Multimodal Model DeveloperJob Description:We are seeking a highly skilled developer to build multimodal models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.Key Responsibilities:Pretrain and finetune...
VLM Research Engineer

4 days ago

Vapi, Gujarat, India Meril Full time

Job Title: VLM Research Engineer Location: Vapi, Gujarat Employment Type: Full-Time Overview We are seeking a highly skilled VLM Research Engineer to build multimodal (vision-language-action) models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and...
VLM Research Engineer

5 days ago

Vapi, Gujarat, India Meril Full time

Job Title: VLM Research EngineerLocation: Vapi, GujaratEmployment Type: Full-TimeOverviewWe are seeking a highly skilled VLM Research Engineer to build multimodal (vision-language-action) models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language...
VLM Research Engineer

2 days ago

Vapi, Gujarat, India Meril Full time

Job DescriptionJob Title: VLM Research EngineerLocation: Vapi, GujaratEmployment Type: Full-TimeOverviewWe are seeking a highly skilled VLM Research Engineer to build multimodal (vision-language-action) models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and...

Americas

Europe

Asia / Oceania

Africa

Multimodal Model Expert