Inference Optimization Engineer(LLM and Runtime)

1 week ago

Delhi, India Sustainability Economics.ai Full time

Location:Bengaluru, KarnatakaAbout the Company:Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.Role Summary:We are seeking a highly skilled and innovativeInference Optimization (LLM and Runtime)to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.Key Tasks and Accountability:Optimization and customizationof large-scale generative models (LLMs) for efficient inference and serving.Apply and evaluate advancedmodel optimization techniquessuch as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance.Implementcustom fine-tuning pipelinesusing parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead.Optimizeruntime performanceof inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate.Design and implementscalable model-serving architectureson GPU clusters and cloud infrastructure (AWS, GCP, or Azure).Work closely with platform and infrastructure teams to reducelatency, memory footprint, and cost-per-tokenduring production inference.Evaluatehardware–software co-optimization strategiesacross GPUs (NVIDIA A100/H100), TPUs, or custom accelerators.Monitor and profile performance using tools such asNsight, PyTorch Profiler, and Triton Metricsto drive continuous improvement.Key Requirements:Education & ExperiencePh.D. inComputer Scienceor a related field, with a specialization inDeep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML) .2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.SkillsStrong analytical and mathematical reasoning ability with a focus on measurable performance gains.Collaborative mindset, with ability to work across research, engineering, and product teams.Pragmatic problem-solver who valuesefficiency, reproducibility, and maintainable codeover theoretical exploration.Curiosity-driven attitude — keeps up withemerging model compression and inference technologies .What You’ll DoTake ownership ofend-to-end optimization lifecycle— from profiling bottlenecks to delivering production-optimized LLMs.Developcustom inference pipelinescapable of high throughput and low latency under real-world traffic.Build and maintaininternal libraries, wrappers, and benchmarking suitesfor continuous performance evaluation.What you will bringHands-on experience in building, optimizing machine learning or Agentic Systemsat scale.A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges.Startup DNA→ bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.Why Join UsShape afirst-of-its-kind AI + clean energy platform .Work with a small, mission-driven team obsessed with impact.An aggressive growth path.A chance to leave your mark at the intersection ofAI and sustainability .

Inference Optimization Engineer(LLM and Runtime)

1 week ago

Delhi, India Sustainability Economics.ai Full time

Location: Bengaluru, Karnataka About the Company: Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive...
Inference Optimization Engineer(LLM and Runtime)

2 weeks ago

New Delhi, India Sustainability Economics.ai Full time

Location: Bengaluru, KarnatakaAbout the Company:Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive...
Inference Optimization Engineer(LLM and Runtime)

3 weeks ago

New Delhi, India Sustainability Economics.ai Full time

Location:Bengaluru, KarnatakaAbout the Company:Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive...
Senior llm engineer

23 hours ago

Delhi, India Infocusp Innovations Full time

About the RoleThe Senior Agentic AI Engineer will lead the design, development, and deployment ofcomplex, autonomous agentic AI solutions within production environments, working acrossmulti-agent frameworks and orchestrating large language models. The position is forscalable applied AI, with a focus on delivering robust and fair agentic systems that...
Senior LLM Engineer

7 hours ago

New Delhi, India Infocusp Innovations Full time

About the RoleThe Senior Agentic AI Engineer will lead the design, development, and deployment of complex, autonomous agentic AI solutions within production environments, working across multi-agent frameworks and orchestrating large language models. The position is for scalable applied AI, with a focus on delivering robust and fair agentic systems that...
MLOps & AI Infrastructure Engineer – Scalable LLM Deployment (Telecom)

2 weeks ago

New Delhi, India Mobileum Full time

About Us:Mobileum is a leading provider of Telecom analytics solutions for roaming, core network, security, risk management, domestic and international connectivity testing, and customer intelligence. More than 1,000 customers rely on its Active Intelligence platform, which provides advanced analytics solutions, allowing customers to connect deep network and...
MLOps & AI Infrastructure Engineer – Scalable LLM Deployment (Telecom)

1 week ago

New Delhi, India Mobileum Full time

About Us:Mobileum is a leading provider of Telecom analytics solutions for roaming, core network, security, risk management, domestic and international connectivity testing, and customer intelligence. More than 1,000 customers rely on its Active Intelligence platform, which provides advanced analytics solutions, allowing customers to connect deep network and...
AI Engineer

1 week ago

New Delhi, India Kayana | Ordering & Payment Solutions Full time

Job Title: AI Engineer (LLMs, Agentic Systems & Model Training)Location: MumbaiEmployment Type: Full-TimeExperience Level: Mid–SeniorAbout the RoleWe are seeking a highly skilled AI Engineer with deep expertise in Large Language Models (LLMs), AI Agents, and advanced retrieval and fine-tuning techniques. The ideal candidate has hands-on experience training...
AI Engineer

3 days ago

New Delhi, India Kayana | Ordering & Payment Solutions Full time

Job Title: AI Engineer (LLMs, Agentic Systems & Model Training)Location:Mumbai Employment Type:Full-Time Experience Level:Mid–SeniorAbout the Role We are seeking a highly skilledAI Engineerwith deep expertise inLarge Language Models (LLMs) ,AI Agents , and advancedretrieval and fine-tuning techniques . The ideal candidate has hands-on experience training...
Artificial Intelligence Engineer

7 days ago

New Delhi, India RxOne (Rx One Care) Full time

Job Title:Artificial Intelligence (AI) Engineer Location:Gurugram, India (Onsite Only) Company:Rx One Care Pvt. Ltd. Type:Full-Time | Immediate Joining PreferredRole Overview deployments, and scalable backend services. You will play a key role in building, fine-tuning, and deploying AI-powered solutions (voice, NLP, automation) that power RxOne’s next-gen...

Americas

Europe

Asia / Oceania

Africa

Inference Optimization Engineer(LLM and Runtime)