Lead Research Scientist, Speech and Audio Foundation Models

6 days ago


Bengaluru, Karnataka, India Krutrim Full time US$ 1,50,000 - US$ 2,00,000 per year

Lead Research Scientist, Speech and Audio Foundation ModelsLocation: Bangalore (India), Singapore and Palo Alto (CA, US)Type of Job: Full-timeAbout Krutrim: Krutrim is building AI computing for the future. Our envisioned AI computing stack encompasses the AI computing infrastructure, AI Cloud, multilingual and multimodal foundational models, and AI-powered end applications. We are India's first AI unicorn and built the first foundation model from the country.

Our AI stack is empowering consumers, startups, enterprises and scientists across India and the world to build their end AI applications or AI models. While we are building foundational models across text, voice, and vision relevant to our focus markets, we are also developing AI training and inference platforms that enable AI research and development across industry domains. The platforms being built by Krutrim have the potential to impact millions of lives in India, across income and education strata, and across languages.

The team at Krutrim represents a convergence of talent across AI research, Applied AI, Cloud Engineering, and semiconductor design. Our teams operate from three locations: Bangalore, Singapore & San Francisco. Job Description:We are seeking a highly skilled and experienced Senior Research Lead for Speech, Audio, and Conversational AI to join our innovative team.

In this role, you will spearhead the research and development of cutting-edge technologies in speech processing, text-to-speech (TTS), audio analysis, and real-time conversational AI. You will push the boundaries of what's possible in automatic speech recognition (ASR), speaker identification, diarization, speech synthesis, and audio generation. Working closely with a team of talented engineers and researchers, you'll design, implement, and optimize state-of-the-art systems that contribute to creating more natural, human-like, and high-quality speech and audio solutions for a variety of applications.

Key Responsibilities:Bring the state of the art in Audio/Speech and Large Language Models to develop advanced Audio Language Models and Speech Language Models. Research, architect, and deploy new generative AI methods such as autoregressive models, causal models, and diffusion modelsDesign and implement low-latency end-to-end models with multilingual speech/audio as both input and output.

Conduct experiments to evaluate and improve the performance of these models, focusing on accuracy, naturalness, efficiency, and real-time capabilities across multiple languages. Stay at the forefront of advancements in speech processing, audio analysis, and large language models, integrating new techniques into our foundation models. Collaborate with cross-functional teams to integrate these foundation models into Krutrim's AI stack and products.

Publish research findings in top-tier conferences and journals such as INTERSPEECH, ICASSP, ICLR, ICML, NeurIPS, and IEEE/ACM Transactions on Audio, Speech, and Language Processing. Mentor and guide junior researchers and engineers, fostering a collaborative and innovative team environment. Drive the adoption of best practices in model development, including rigorous testing, documentation, and ethical considerations in multilingual AI.Qualifications:Ph.

D. with 5 years or MS with 8 years of experience in Computer Science, Electrical Engineering, or a related field with a focus on speech processing, audio analysis, and machine learning. Train or finetune speech / audio models for representation (like, W2V-BERT, SONAR, AST), generation (like, Hi-Fi GAN, VQ-GAN, AudioLDM), Conformers, multilingual multitask models (like, SeamlessM4T). Expertise with Audio Language Models like AudioPALM, Moshi and Seamless M4TProven track record of developing and applying novel neural network architectures such as Transformers, Mixture of Experts, Diffusion Models, and

State Space Machines (MAMBA, SAMBA). Extensive experience in developing and optimizing models for low-latency, real-time applications. Strong background in multilingual speech recognition and synthesis, with an understanding of the challenges specific to different language families.

Proficiency in deep learning frameworks (e.g., TensorFlow, PyTorch) and experience deploying large-scale speech and audio models. Demonstrated expertise in high-performance computing with proficiency in Python, C/C , CUDA, and kernel-level programming for AI applications. Experience with audio signal processing techniques and their application in end-to-end neural models.

Strong track record of publications in top AI conferences and journals, particularly in the areas of speech, audio, and language models. Excellent communication skills, with the ability to explain complex technical concepts to both technical and non-technical audiences. Passion for pushing the boundaries of what's possible in speech and audio AI, with a focus on practical, real-world applications.

Join Krutrim to shape the future of AI and make a significant impact on 100s of millions of lives across India and the world. If you're passionate about pushing the boundaries of AI and want to work with a team at the forefront of innovation, we want to hear from you



  • Bengaluru, Karnataka, India YAL Full time

    Job description Lead Data Scientist-speech specialist (ASR)Location: Bangalore (India)Type: Full-Time | Immediate Joining PreferredCTC: Competitive ( 25-50 LPA )About YAL.aiYAL.Ai which stands for Your Alternative Life, is a revolutionary end-to-end communication and discovery platform that redefines how people connect, interact, and collaborate.Powered by...


  • Bengaluru, Karnataka, India Albatronix Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    About The Opportunity : Join a high-velocity engineering team building robust, low-latency speech and voice solutions for large-scale deployments. You will design and ship state-of-the-art ASR models and production pipelinesbridging classical signal-processing foundations with modern transformer-based speech models to drive measurable product impact. ...


  • Bengaluru, Karnataka, India YAL Full time

    Job description Lead Data Scientist-speech specialist (ASR)Location: Bangalore (India)Type: Full-Time | Immediate Joining PreferredCTC: Competitive ( 25-50 LPA )About YAL.aiYAL.Ai which stands for Your Alternative Life, is a revolutionary end-to-end communication and discovery platform that redefines how people connect, interact, and collaborate. Powered by...


  • Bengaluru, Karnataka, India i Full time

    Position : Speech Data ScientistExperience Level : 3-6 yearsLocation : Bangalore, IndiaKey Responsibilities :Core Development & Implementation :- Design and implement end-to-end speech analytics pipelines for production environments- Develop ASR engines using state-of-the-art frameworks (Wav2vec, Whisper, Deep Speech) with PyTorch or TensorFlow- Build and...


  • Bengaluru, Karnataka, India Scouto AI Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Core Speech Processing5 years of hands-on experience in speech recognition and processingDeep understanding of classical methodologies: HMMs, GMMs, ANNs, Language modelingExpertise in modern deep learning techniques: CNNs, RNNs, LSTMs, CTC, Attention mechanismsStrong background in digital signal processing and audio analysisMachine Learning & Deep...


  • Bengaluru, Karnataka, India Career Makers Full time ₹ 5,00,000 - ₹ 15,00,000 per year

    Research and Development - Audio Processing - Design and develop AI based Audio/Speech processing algorithms and solution software that meets consumer needs - Optimize AI solution on Company's proprietary hardware (DSP/NPU etc.), develop performance evaluation measures and evaluate - Explore areas to apply machine learning in audio and speech,...

  • Speech Scientist

    6 days ago


    Bengaluru, Karnataka, India Career Makers Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    - Hands on expertise in Artificial Intelligence, Machine Learning, Neural Networks, Human- Robot/Human -Computer interaction, data science, applied mathematics and Computer Vision. - As a Research Scientist, you will work on some challenging problems in Machine Learning, Natural Language Processing and Image Processing using cutting-edge Machine Learning...


  • Bengaluru, Karnataka, India Shashwath Solution Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Good Programming Knowledge - Python, C++ Python Libraries -Numpy, Pandas DL framework - Pytorch, Keras Tensorflow, Scikit Learn AI Concepts - Strong fundamentals & working experience in o Machine Learning (ML) o Deep Learning (DL) o Model Tuning, Optimization & Verification o Transfer Learning Excellent understanding of lower level fundamentals of...


  • Bengaluru, Karnataka, India Amazon Full time

    DESCRIPTIONAlexa is the voice activated digital assistant powering devices like Amazon Echo Echo Dot Echo Show and Fire TV which are at the forefront of this latest technology wave To preserve our customers experience and trust the Alexa Privacy team creates policies and builds services and tools through Machine Learning techniques to detect and...


  • Bengaluru, Karnataka, India Amazon Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Alexa+ is Amazon's next-generation, AI-powered virtual assistant. Building on the original Alexa, it uses generative AI to deliver a more conversational, personalized, and effective experience. Alexa Sensitive Content Intelligence (ASCI) team is developing responsible AI (RAI) solutions for Alexa+, empowering it to provide useful information responsibly. The...