Senior Researcher – LLM Systems
7 days ago
Within our Microsoft wide Systems Innovation initiative, we are working to advance efficiency across AI systems, where we look at novel designs and optimizations across AI stacks: models, AI frameworks, cloud infrastructure, and hardware. We are an Applied Research team driving mid- and long-term product innovations. We closely collaborate with multiple research teams and product groups across the globe who bring a multitude of technical expertise in cloud systems, machine learning and software engineering. We communicate our research both internally and externally through academic publications, open-source releases, blog posts, patents, and industry conferences. Further, we also collaborate with academic and industry partners to advance the state of the art and target material product impact that will affect 100s of millions of customers.
We are looking for a Senior Researcher – Systems Researcher to invent, analyze, and productionize the next generation of serving architectures for transformer-based models across cloud and edge. The candidate will focus on algorithmic and systems innovations, including batching, routing, scheduling, caching, deployment safety, and endpoint configuration, that materially improve latency, throughput, cost, and reliability under real-world SLAs for Microsoft Copilots.
The ideal candidate brings a strong background in distributed systems, operating systems, and/or large-scale ML serving, plus the ambition to translate research into impact in production environments. This role blends rigorous research (theory + measurement) with hands-on engineering, and includes publishing papers, filing patents, and collaborating across research and product teams to advance the state of the art.
Have a look at this link for reading: Efficient AI - Microsoft Research
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
- Invent and evaluate algorithms for dynamic batching, routing, and scheduling for transformer inference under multi-tenant SLOs and variable sequence lengths.
- Design and implement caching layers (e.g., KV cache paging/offload, prompt/result caching) and memory pressure controls to maximize GPU/accelerator utilization.
- Develop endpoint configuration policies (e.g., tensor/pipe parallelism, quantization/precision profiles, speculative decoding, chunked/streaming generation) and safe rollout mechanisms.
- Profile and optimize end-to-end serving pipelines: token-level latency, E2E p95/p99, throughput-per-$, cold-start behavior, warm pool strategy, and capacity planning.
- Collaborate with model, kernel, and hardware teams to align serving algorithms with attention/KV innovations and accelerator features.
- Publish research, file patents, and, where appropriate, contribute to open-source serving frameworks.
- Document designs, benchmarks, and operational playbooks; mentor junior researchers/engineers.
Required Qualifications:
- Doctorate in relevant field
- OR equivalent experience.
- Demonstrated expertise in queuing/scheduling theory and practical request orchestration under SLO constraints.
- Proficiency in C++ and Python for high-performance systems; strong code quality and profiling/debugging skills.
- Proven record of research impact (publications and/or patents) and shipping systems that run at scale.
Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Deep understanding of transformer inference efficiency techniques (attention, paged KV cache, speculative decoding, LoRA, sequence packing/continuous batching, quantization).
- Background in cost/performance modeling, autoscaling, and multi-region DR.
- Hands-on experience with inference serving frameworks (e.g., vLLM, Triton Inference Server, TensorRT-LLM, ONNX Runtime/ORT, Ray Serve, DeepSpeed-MII).
- Familiarity with GPU/accelerator memory management concepts to co-design cache/throughput policies.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
#M365Core #M365Research #Research
-
Senior Researcher – LLM Systems
3 days ago
Bengaluru, Karnataka, India Microsoft Full time ₹ 8,00,000 - ₹ 24,00,000 per yearGenerative AI is transforming how people create, collaborate, and communicate - redefining productivity across Microsoft 365 and our customers globally. At Microsoft, we run the biggest platform for collaboration and productivity in the world with hundreds of millions of consumer/enterprise users. Tackling AI efficiency challenges is crucial for delivering...
-
LLM Researcher
2 weeks ago
Bengaluru, Karnataka, India Equinix Full time ₹ 12,00,000 - ₹ 36,00,000 per yearLLM ResearcherJR-155913HybridBengaluruData Science, Reporting and AnalyticsFull timeWho are we?Equinix is the world's digital infrastructure company, shortening the path to connectivity to enable the innovations that enrich our work, life and planet.A place where bold ideas are welcomed, human connection is valued, and everyone has the opportunity to shape...
-
LLM Research Intern
1 week ago
Bengaluru, Karnataka, India Origin Medical Research Lab Full time ₹ 5,00,000 - ₹ 15,00,000 per yearJump-start your career when you join Origin Medical Research Lab Internship Program an immersive curriculum designed to develop and empower you as you grow in your career. As an AI researcher, you will work with fellow Research Engineers, Clinical Data Engineers, AI Scientists, leading medical institutions, and clinicians to develop state-of-the-art, safe,...
-
Senior AI/ML Researcher
7 days ago
Bengaluru, Karnataka, India Big Air Lab Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title : Senior AI/ML Researcher (Ph.D) Agentic AI & LLMs - R&D LeadershipLocation : Bengaluru, IndiaCompany OverviewAbout the Role :Big Air Labis building a dedicated R&D department, and we seek an independent and visionary Senior AI/ML Researcher to lead and guide a dynamic team of approximately 8-10 members, including interns.This role encompasses a...
-
Senior AI/ML Researcher
7 days ago
Bengaluru, Karnataka, India IAI solution Pvt Ltd Full time ₹ 20,00,000 - ₹ 25,00,000 per yearJob Title : Senior AI/ML Researcher (Ph.D) Agentic AI & LLMs - R&D LeadershipLocation : Bengaluru, IndiaCompany Overview : About the Role : Big Air Lab is building a dedicated R&D department, and we seek an independent and visionary Senior AI/ML Researcher to lead and guide a dynamic team of approximately 8-10 members, including interns. This role...
-
LLM Training
1 week ago
Bengaluru, Karnataka, India Inoptra Digital Full time ₹ 1,20,000 - ₹ 6,00,000 per yearOverviewWe are seeking a highly skilled LLM Training & Model Development Engineer with strong expertise in data engineering, model fine-tuning, and optimization. The role focuses on building, customizing, and optimizing home-grown Large Language Models (LLMs) tailored to domain-specific applications. Youll work closely with data scientists, prompt engineers,...
-
LLM Training
1 week ago
Bengaluru, Karnataka, India Inoptra Digital Full time ₹ 1,20,000 - ₹ 6,00,000 per yearRole & responsibilitiesPosition 3: LLM Training & Model Development Engineer (Home-Grown LLM & Fine-Tuning)OverviewWe are seeking a highly skilled LLM Training & Model Development Engineer with strong expertise in data engineering, model fine-tuning, and optimization. The role focuses on building, customizing, and optimizing home-grown Large Language Models...
-
Senior Research Engineer, AI/ML
2 weeks ago
Bengaluru, Karnataka, India Google Full time ₹ 1,20,000 - ₹ 2,60,000 per yearMinimum qualifications:Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.5 years of experience with software development in one or more programming languages.2 years of experience in applied Machine Learning within Conversational AI: NLU for multi-turn dialogue and context, modeling strategies with smaller...
-
Senior Research Engineer, AI/ML
1 week ago
Bengaluru, Karnataka, India Google Full time US$ 1,20,000 - US$ 1,80,000 per yearMinimum qualifications:Bachelor's degree in Computer Science, a related technical field, or equivalent practical experience.5 years of experience with software development in one or more programming languages.2 years of experience in applied Machine Learning within Conversational AI: NLU for multi-turn dialogue and context, modeling strategies with smaller...
-
LLM Engineer
1 week ago
Bengaluru, Karnataka, India micro1 Full time ₹ 15,00,000 - ₹ 25,00,000 per yearJob Title:LLM EngineerJob Type:Full-timeExperience: 3-9 YearsLocation:Hybrid – Pune, Gurugram, Bengaluru, HyderabadJob Summary:We're seeking an experiencedLLM Engineer to design and deploy cutting-edge AI solutions using LLM infrastructure and tooling. You'll work with a cross-functional team to build scalable, production-grade applications leveraging the...