Multimodal AI Software Modeler

3 days ago


Vapi, Gujarat, India beBeeModeler Full time ₹ 40,00,000 - ₹ 50,00,000
Software Modeler - Multimodal AI Expert

We seek a skilled model developer to create cutting-edge models for task-oriented dialogue systems, vision-language understanding, and multimodal perception.

Main Responsibilities
  • Pretrain and fine-tune visual language models (VLMs) aligning them with robotics data including video, teleoperation, and language.
  • Build perception-to-language grounding for referring expressions, affordances, and task graphs.
  • Develop interfaces to convert language intents into actionable skills and motion plans.
  • Create evaluation pipelines for instruction following, safety filters, and hallucination control.
Necessary Qualifications
  • Masters or PhD in relevant field.
  • 12+ years of experience in Computer Vision/Machine Learning.
  • Strong proficiency in PyTorch or JAX; experience with LLMs and VLMs.
  • Familiarity with multimodal datasets, distributed training, and RL/IL.
Bonus Requirements
  • Experience with world models, diffusion-policy integration, and speech interfaces.
  • Familiarity with sim-to-real transfer in robotics applications.

Key performance metrics include: Success@k on language-based tasks, Grounding precision and latency, Sim-to-real performance retention.



  • Vapi, Gujarat, India beBeeArtificialintelligence Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    We are seeking a skilled Multimodal AI Researcher to develop and implement multimodal models for instruction following, scene grounding, and tool use across various platforms.Key Responsibilities:Pretrain and fine-tune VLMs aligning them with robotics data including video, teleoperation, and language.Build perception-to-language grounding for referring...


  • Vapi, Gujarat, India beBeeMachineLearning Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Research EngineerWe are seeking a highly skilled Research Engineer to build multimodal models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.Requirements- 12+ years of experience in Computer Vision/Machine...


  • Vapi, Gujarat, India beBeeMachineLearning Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    Job Title: Vision-Led Model Research Engineer Job Description:We are seeking a highly skilled Research Engineer to build multimodal models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.Key...


  • Vapi, Gujarat, India beBeeVisionLanguage Full time ₹ 12,00,000 - ₹ 20,00,000

    Advanced Multimodal Model DeveloperJob Description:We are seeking a highly skilled developer to build multimodal models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language understanding for autonomous systems.Key Responsibilities:Pretrain and finetune...


  • Vapi, Gujarat, India beBeeVlm Full time ₹ 1,20,00,000 - ₹ 2,00,00,000

    Job Title: VLM Research Engineer">">Location: Vapi, Gujarat">">Employment Type: Full-Time">">Overview">">We are seeking a highly skilled expert in multimodal (vision-language-action) models to build instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language...


  • Vapi, Gujarat, India beBeeMultimodal Full time US$ 1,20,000 - US$ 1,50,000

    Job Summary:We are seeking a highly skilled researcher to build multimodal models for instruction following, scene grounding, and tool use across platforms.The role involves developing advanced models that bridge perception and language understanding for autonomous systems.About the RoleDevelop vision-language models (VLMs) aligning them with robotics data...


  • Vapi, Gujarat, India beBeeVisionLanguageModelEngineer Full time ₹ 9,00,000 - ₹ 12,00,000

    Job Title: Vision-Language Model EngineerAbout the Role:We are seeking a highly skilled Vision-Language Model (VLM) engineer to develop multimodal models for instruction following, scene grounding, and tool use across various platforms. The role involves designing advanced models that bridge perception and language understanding for autonomous systems.Key...

  • AI Research Scientist

    8 hours ago


    Vapi, Gujarat, India Credartha Fin Solution Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Implement and innovate upon state-of-the-art AI frameworks. You will be working hands-on to build systems for:Multimodal visual understanding.Transparent and verifiable logical reasoning.Powerful, internet-enabled tool-using agents.Job/soft skill training


  • Vapi, Gujarat, India Meril Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Job Title: Senior Data Scientist – NLP & Computer VisionLocation: VapiAbout Us:(Your Company Name) is an innovation-driven organization focused on solving complex real-world problems through cutting-edge AI and ML technologies. We are looking for a Senior Data Scientist who thrives at the intersection of language and vision, someone passionate about...

  • Vlm research engineer

    12 hours ago


    Vapi, Gujarat, India Meril Full time

    Job Title: VLM Research Engineer Location: Vapi, Gujarat Employment Type: Full-TimeOverviewWe are seeking a highly skilled VLM Research Engineer to build multimodal (vision-language-action) models for instruction following, scene grounding, and tool use across platforms. The role involves developing advanced models that bridge perception and language...