Multimodal Vision LLM Engineer

3 days ago


Delhi, Delhi, India 3da163f9-ed57-45e3-aa90-bfee7357ebd0 Full time ₹ 15,00,000 - ₹ 30,00,000 per year

About the Role

We are building the next generation of spatial intelligence where robots and 3D systems understand and interact with the world in real time. As a Multimodal LLM Engineer, you will design, train, and deploy vision-language models that understand detected objects, 3D environments, and dynamic scenes. Your work will enable robots and digital tools to reason about objects, context, safety, and actions—entirely on-device.

You will collaborate closely with perception, robotics, and systems engineers to bring together 3D vision, object detection, and LLM reasoning into a unified real-time intelligence engine.

This is a highly technical role with direct impact on core product capabilities.

Responsibilities

  • Develop and fine-tune multimodal LLMs (vision-language, 3D-language, object-context reasoning).
  • Build pipelines that fuse object detection, 3D data, bounding boxes, and sensor inputs into LLM tokens.
  • Architect models that interpret dynamic scenes, track changes, and deliver contextual reasoning.
  • Implement region-based reasoning, spatial attention, temporal understanding, and affordance prediction.
  • Train and optimize models using frameworks such as LLaVA, Qwen-VL, InternVL, CLIP/SigLIP, SAM, DETR, or custom backbones.
  • Convert raw perception output into structured representations (scene graphs, spatial embeddings).
  • Work with Robotics/Systems teams to integrate LLM reasoning into real-time pipelines (30–60 FPS).
  • Develop scalable data pipelines for multimodal datasets (images, detections, 3D meshes, text descriptions).
  • Perform model evaluation on context understanding, safety judgment, and action recommendation.
  • Collaborate on model compression and deployment for edge devices (Rockchip, Jetson, Apple M-series).

Minimum Qualifications

  • MS or PhD in Computer Science, AI/ML, Robotics, or related field—or equivalent experience.
  • 3+ years experience building deep learning models, including transformers.
  • Hands-on experience with multimodal models (VLMs) or LLM fine-tuning.
  • Strong understanding of one or more:
  • Vision Transformers (ViT, SigLIP)
  • CLIP-style contrastive models
  • LLaVA / BLIP / Qwen-VL / InternVL
  • DETR / SAM / YOLO / 3D perception networks
  • Advanced Python and PyTorch skills.
  • Experience training models with large datasets and distributed systems.
  • Solid understanding of model architecture fundamentals (attention, tokenization, embeddings).

Job Type: Full-time

Pay: ₹1,500, ₹3,000,000.00 per year


  • Gen AI/ LLM Engineer

    59 minutes ago


    Delhi, Delhi, India Carnot Research Pvt Ltd Full time ₹ 8,00,000 - ₹ 35,00,000 per year

    Carnot Research is hiring a GenAI/LLM EngineerJoin our rapidly growing AI engineering team as we build cutting-edge agentic systems, deploy production-ready LLMs, and create next-generation multimodal applications. If you're passionate about working with large language models, implementing agentic frameworks, and pushing the boundaries of AI capabilities,...

  • LLM - Python

    6 hours ago


    Delhi, Delhi, India ProEchoes Technology Full time ₹ 5,00,000 - ₹ 15,00,000 per year

    Company DescriptionProEchoes Technology is a dynamic and innovative team driven to deliver exceptional value by leveraging a blend of technologists, domain experts, architects, quality assurers, and managers. Our commitment to excellence ensures successful deliverables and fosters long-term client relationships. With a strong focus on disruptive technologies...


  • Delhi, Delhi, India Wynploy Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Required Skills4-5 years experience in deep learning/computer visionExpert Python proficiency with TensorFlow, PyTorch, KerasStrong knowledge of OpenCV, computer vision algorithmsExperience with neural architectures (CNNs, YOLO, ResNet, Transformers)Cloud deployment experience (AWS, GCP, Azure)MLOps knowledge (Docker, Git, CI/CD pipelines)Production model...


  • Delhi, Delhi, India Delphi Consulting Middle East Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Join Delphi - Where Innovation meets transformationAt Delphi, we believe in creating an environment where our people thrive. Ourhybrid work modelempowers you to choose where you work—whether it's from the office, your home, or a mix of both—so you can prioritize what matters most. We are committed to supporting your personal goals, family, and overall...


  • Delhi, Delhi, India Microsoft Full time ₹ 18,00,000 - ₹ 24,00,000 per year

    Enable AI agents using MCP. Build for billions of users. Design and implement intelligent features that leverage voice, camera, and screen context to assist users in real time. Build and optimize agentic workflows using MCP to enable autonomous task execution across apps and services. Collaborate with cross-functional teams across Windows, Office, and Azure...


  • Delhi, Delhi, India Recro Full time ₹ 15,00,000 - ₹ 30,00,000 per year

    Experience: 12–18 years of experience in technology/product leadership, with at least 7+ years in geospatial, navigation, or mobility platforms.Role DescriptionThe candidate will be responsible for building world-class maps, navigation, and traffic intelligence platforms, ensuring accuracy, scalability, and low-latency performance. The role requires deep...

  • AI/ML Engineers L4

    1 week ago


    Delhi, Delhi, India Griphic Full time ₹ 12,00,000 - ₹ 3,00,00,000 per year

    AI/ML Engineer -L4/L5 (Lead / Senior-Lead)Location & Type: Delhi, Full-timeCTC Range (LPA): Role OverviewWe're looking for a hands-on AI/ML Lead Engineerwho can bridge research, product, and engineering.You'llown the full ML lifecycle- from problem framing and data pipelines to model deployment, evaluation, and scaling in production.You'll guide a small team...

  • AI/ML Engineers L4

    5 days ago


    Delhi, Delhi, India Griphic Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    AI/ML Engineer -L4/L5 (Lead / Senior-Lead)Location & Type:Delhi, Full-timeCTC Range (LPA): Role OverviewWe're looking for ahands-on AI/ML Lead Engineerwho can bridge research, product, and engineering.You'llown the full ML lifecycle- from problem framing and data pipelines to model deployment, evaluation, and scaling in production.You'll guide a small team...

  • Full Stack Engineer

    1 week ago


    Delhi, Delhi, India 9e47e1f7-fc17-4370-93d0-6da3463012f4 Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Now Hiring: Senior Full-Stack Engineer (AI-Integrated)The Harvester AI | Building Predictive Intelligence + CASQ Control Artificial ConsciousnessWe're building next-generation intelligence systems grounded in cognition, emotion modeling, and predictive behavior. Our CASQ Control layer is the stability engine designed to keep future AI, humanoids, and...

  • HVAC Engineer

    2 weeks ago


    Delhi, Delhi, India Kite Infocom Vision Pvt. Ltd. (KIVPL) Full time ₹ 2,64,000 - ₹ 3,00,000 per year

    Job Title: HVAC Engineer of VRV/VRF and PackageLocation: DelhiAt Kite Infocom Vision Pvt. Ltd., focusing on VRV, VRF, and packaged systems. We are dedicated to delivering exceptional service and innovative solutions to our clients. If you have a passion for HVAC technology and expertise in maintenance and troubleshooting, we want to hear from youJob...