On-Device Machine Learning Principal Engineer

2 weeks ago


Bengaluru, India Mulya Technologies Full time

Principal Machine Learning Engineer - Multimodal AI & InferenceBangaloreFounded in 2023,by Industry veterans HQ in California,USWe are revolutionizing sustainable AI compute through intuitive software with composable silicon Overview:You will design, optimize, and deploy large multimodal models (language, vision, audio, video) to run efficiently on a compact, high-performance AI appliance capable of supporting 100B+ parameter models at real-time speeds. Your mission is to deliver state-of-the-art multimodal inference locally through advanced model optimization, quantization, and system-level integration. Key Responsibilities:1. Model Integration & PortingOptimize large-scale foundation models (e.G., Llama, gpt-oss, Whisper, HiDream, Qwen, Wan etc) for on-device inference.Adapt pre-trained models for multimodal tasks (text, image, audio, video, or cross-modal reasoning).Ensure seamless interoperability between modalities — e.G., enabling the system to “see, hear, and talk” naturally.2. Model Optimization for Edge HardwareQuantize and compress large models (4-bit or mixed precision) while maintaining high accuracy and low latency.Implement and benchmark inference runtimes using frameworks like Llama.Cpp, Ollama, vLLM, ONNX etc.Collaborate with hardware engineers to co-design model architectures optimized for the appliance’s compute fabric.3. Inference Pipeline DevelopmentBuild and maintain scalable, high-throughput inference pipelines capable of handling concurrent multimodal requests (text, audio, image, video).Implement token streaming, caching, and scheduling strategies for real-time responses.Develop APIs for low-latency local inference accessible via a web interface.4. Evaluation & BenchmarkingProfile and benchmark performance (throughput, latency, energy efficiency) of deployed models.Run regression tests to validate numerical accuracy after quantization or pruning.Define KPIs for multimodal model performance under real-world usage.5. Research & PrototypingInvestigate emerging multimodal architectures and lightweight model variants for local deployment.Prototype hybrid models that combine LLMs, diffusion models, and ASR/TTS pipelines for advanced multimodal applications.Stay current on state-of-the-art inference frameworks, compression techniques, and multimodal learning trends. Required Qualifications:Strong background in deep learning and model deployment, with hands-on experience in PyTorch and/or TensorFlow.Expertise in model optimization — quantization, pruning, distillation, or mixed-precision inference.Practical knowledge of inference engines (vLLM, llama.Cpp, ONNX Runtime or similar).Experience deploying large models locally or on edge devices with limited memory/compute constraints.Familiarity with multimodal model architectures — e.G., CLIP, Flamingo, LLaVA, or AudioGPT-style systems.Strong software engineering skills (Python, C++, CUDA) and experience integrating models into production systems.Understanding of GPU/accelerator utilization, memory bandwidth optimization, and distributed inference. Preferred Qualifications:experience-10+ yearsExperience with model-parallel or tensor-parallel inference at scale.Contributions to open-source inference frameworks or model serving systems.Familiarity with hardware-aware training or co-optimization of neural networks and hardware.Background in speech, vision, or multimodal ML research.Track record of deploying models that run entirely offline or on embedded/edge systems.Contact:UdayMulya Technologiesmuday_bhaskar@yahoo.com"Mining The Knowledge Community"



  • Bengaluru, Karnataka, India Oracle Full time

    At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world. In the OCI AI Science org we are addressing exciting challenges at the...


  • Bengaluru, Karnataka, India Informatica Full time US$ 6,00,000 - US$ 12,00,000 per year

    Informatica is on a journey to use generative AI to simplify cloud data management. Principal Machine Learning Engineer will drive the overall architecture and pipelines to enable Machine Learning engineers to build, train and deploy Models at scale across multiple cloud services providers in a cloud agnostic manner. You will be visible and critical to...


  • Bengaluru, India Oracle Full time

    At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world. In the OCI AI Science org we are addressing exciting challenges at the...


  • Bengaluru, Karnataka, India Atlassian Full time

    OverviewWorking at AtlassianAtlassians can choose where they work – whether in an office, from home, or a combination of the two. That way, Atlassians have more control over supporting their family, personal goals, and other priorities. This is a remote position. To help our teams work together effectively, this role requires you to be located in...


  • Bengaluru, India Atlassian Full time

    Job Description Working at Atlassian Atlassians can choose where they work - whether in an office, from home, or a combination of the two. That way, Atlassians have more control over supporting their family, personal goals, and other priorities. This is a remote position. To help our teams work together effectively, this role requires you to be located in...


  • Bengaluru, India Oracle Full time

    Job Description At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world. In the OCI AI Science org we are addressing exciting challenges...


  • Bengaluru, Karnataka, India Microsoft Full time

    Microsoft’s Cloud and AI group is at the forefront of cloud computing and artificial intelligence, driving innovation and large-scale AI implementation. The Customer Experience (CX) data team within this group is dedicated to fostering a data-driven culture. Our machine learning team conducts applied research in ML/AI, developing and deploying cutting-edge...


  • Bengaluru, Karnataka, India Oracle Full time

    DescriptionAt Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world.In the OCI AI Science org we are addressing exciting challenges at the...


  • Bengaluru, Karnataka, India Tricog Health Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    We are looking for a curious and passionate Machine Learning Engineer to join our high-impact team. You'll work directly on ML models that analyze cardiac data, helping doctors save lives. This role offers the unique opportunity to see your work make a tangible difference in patient outcomes while building state-of-the-art ML infrastructure.What You'll...


  • Bengaluru, India Mulya Technologies Full time

    Principal Machine Learning Engineer - Multimodal AI & Inference BangaloreFounded in 2023,by Industry veterans HQ in California,USWe are revolutionizing sustainable AI compute through intuitive software with composable siliconOverview:You will design, optimize, and deploy large multimodal models (language, vision, audio, video) to run efficiently on a...