Inference Optimization Engineer(LLM and Runtime)
5 days ago
Location:
Bengaluru, Karnataka
About the Company:
Sustainability is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.
Role Summary:
We are seeking a highly skilled and innovative
Inference Optimization (LLM and Runtime)
to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.
Key Tasks and Accountability:
- Optimization and customization
of large-scale generative models (LLMs) for efficient inference and serving. - Apply and evaluate advanced
model optimization techniques
such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance. - Implement
custom fine-tuning pipelines
using parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead. - Optimize
runtime performance
of inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate. - Design and implement
scalable model-serving architectures
on GPU clusters and cloud infrastructure (AWS, GCP, or Azure). - Work closely with platform and infrastructure teams to reduce
latency, memory footprint, and cost-per-token
during production inference. - Evaluate
hardware–software co-optimization strategies
across GPUs (NVIDIA A100/H100), TPUs, or custom accelerators. - Monitor and profile performance using tools such as
Nsight, PyTorch Profiler, and Triton Metrics
to drive continuous improvement.
Key Requirements:
Education & Experience
- Ph.D. in
Computer Science
or a related field, with a specialization in
Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML)
. - 2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.
Skills
- Strong analytical and mathematical reasoning ability with a focus on measurable performance gains.
- Collaborative mindset, with ability to work across research, engineering, and product teams.
- Pragmatic problem-solver who values
efficiency, reproducibility, and maintainable code
over theoretical exploration. - Curiosity-driven attitude — keeps up with
emerging model compression and inference technologies
.
What You'll Do
- Take ownership of
end-to-end optimization lifecycle
— from profiling bottlenecks to delivering production-optimized LLMs. - Develop
custom inference pipelines
capable of high throughput and low latency under real-world traffic. - Build and maintain
internal libraries, wrappers, and benchmarking suites
for continuous performance evaluation.
What you will bring
- Hands-on experience in building, optimizing machine learning or Agentic Systems
at scale. - A builder's mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges.
- Startup DNA
→ bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.
Why Join Us
- Shape a
first-of-its-kind AI + clean energy platform
.
- Work with a small, mission-driven team obsessed with impact.
- An aggressive growth path.
- A chance to leave your mark at the intersection of
AI and sustainability
.
-
Full Stack LLM Engineer
2 weeks ago
Bengaluru, Karnataka, India Valuebound Full time ₹ 20,00,000 - ₹ 25,00,000 per yearAbout The RoleThis teams' principal responsibility is to rapidly bring up state-of-the-art open-source models, frameworks and data engineering. Success in this role requires a system-minded generalist who thrives in fast-paced bringup environments and is comfortable working across the entire software stack. Your work will play a critical role in achieving...
-
Sr Staff Engineer
7 days ago
Bengaluru, Karnataka, India Qualcomm Full time ₹ 12,00,000 - ₹ 36,00,000 per yearCompany:Qualcomm India Private LimitedJob Area:Engineering Group, Engineering Group > Systems EngineeringGeneral Summary:As part of Qualcomm AI Systems Solution CoE (Center of Excellence) team, you will develop leading-edge products and solutions around best-in-class Qualcomm high-performance inference accelerators for cloud, edge and hybrid AI applications....
-
Bengaluru, Karnataka, India EXL Full time ₹ 12,00,000 - ₹ 36,00,000 per yearDescriptionLead the architecture, development, and deployment of scalable machine learning systems, focusing on real-time inference for LLMs serving multiple concurrent users.Optimize inference pipelines using high-performance frameworks like vLLM, Groq, ONNX Runtime, Triton Inference Server, and TensorRT to minimize latency and cost.Design and implement...
-
Staff GPU Systems Engineer
3 days ago
Bengaluru, Karnataka, India Careernet Full time ₹ 12,00,000 - ₹ 24,00,000 per yearKey Skills: Triton, C++, GPU Runtime Optimization, Multi-GPU Systems, TVM, XLA, MLIR, ROCm, Transformer Inference.Roles & Responsibilities:Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.Lead investigations into cross-GPU performance bottlenecks, communication overheads, and...
-
AI ML Test Engineer
1 week ago
Bengaluru, Karnataka, India Capgemini Engineering Full time US$ 1,20,000 - US$ 2,00,000 per yearJob DescriptionWe are seeking a highly skilled AI/ML Validation Engineer with a strong foundation in machine learning, deep learning, and system-level validation. The ideal candidate will have hands-on experience with ML frameworks, profiling tools, and AI compute stacks, and will play a key role in validating end-to-end AI pipelines and ensuring software...
-
Bengaluru, Karnataka, India Mobileum Full time ₹ 12,00,000 - ₹ 24,00,000 per yearAbout Us:Mobileum is a leading provider of Telecom analytics solutions for roaming, core network, security, risk management, domestic and international connectivity testing, and customer intelligence. More than 1,000 customers rely on its Active Intelligence platform, which provides advanced analytics solutions, allowing customers to connect deep network and...
-
Machine Learning Engineer
2 weeks ago
Bengaluru, Karnataka, India Sarvam AI Full time ₹ 12,00,000 - ₹ 24,00,000 per yearCompany Overview is a pioneering generative AI startup headquartered in Bengaluru, India. We are dedicated to leading transformative research and development in the field of language technologies. With a focus on building scalable and efficient Large Language Models (LLMs) that support a wide range of languages, particularly Indic languages, aims to...
-
MCP Engineer
2 weeks ago
Bengaluru, Karnataka, India Machani Robotics Full time ₹ 15,00,000 - ₹ 25,00,000 per yearLocation:Bengaluru, India —On-site (Full-time)Compensation:Competitive package based on experienceWho We AreAt Machani Robotics, we're engineering the future of humanoid intelligence — where software, hardware, and AI converge.Our Humanoid Framework powers robots like RIA and others through a distributed network of cognitive agents — internal services...
-
Senior Researcher – LLM Systems
7 days ago
Bengaluru, Karnataka, India Microsoft Full time ₹ 12,00,000 - ₹ 36,00,000 per yearGenerative AI is transforming how people create, collaborate, and communicate - redefining productivity across Microsoft 365 and our customers globally. At Microsoft, we run the biggest platform for collaboration and productivity in the world with hundreds of millions of consumer/enterprise users. Tackling AI efficiency challenges is crucial for delivering...
-
Machine Learning Engineer
2 weeks ago
Bengaluru, Karnataka, India Apple Full time ₹ 20,00,000 - ₹ 25,00,000 per yearImagine what you could do here. At Apple, new ideas have a way of becoming phenomenal products, services, and customer experiences very quickly. Every single day, people do amazing things at Apple. Do you want to impact the future of Manufacturing here at Apple through cutting edge ML techniques? This position involves a wide variety of skills, innovation,...