Inference Optimization Engineer(LLM and Runtime)
1 week ago
Location:Bengaluru, KarnatakaAbout the Company:Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.Role Summary: We are seeking a highly skilled and innovativeInference Optimization (LLM and Runtime)to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.Key Tasks and Accountability: Optimization and customizationof large-scale generative models (LLMs) for efficient inference and serving. Apply and evaluate advancedmodel optimization techniquessuch as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance. Implementcustom fine-tuning pipelinesusing parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead. Optimizeruntime performanceof inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate. Design and implementscalable model-serving architectureson GPU clusters and cloud infrastructure (AWS, GCP, or Azure). Work closely with platform and infrastructure teams to reducelatency, memory footprint, and cost-per-tokenduring production inference. Evaluatehardware–software co-optimization strategiesacross GPUs (NVIDIA A100/H100), TPUs, or custom accelerators. Monitor and profile performance using tools such asNsight, PyTorch Profiler, and Triton Metricsto drive continuous improvement.Key Requirements: Education & Experience Ph.D. inComputer Scienceor a related field, with a specialization inDeep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML) . 2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.Skills Strong analytical and mathematical reasoning ability with a focus on measurable performance gains. Collaborative mindset, with ability to work across research, engineering, and product teams. Pragmatic problem-solver who valuesefficiency, reproducibility, and maintainable codeover theoretical exploration. Curiosity-driven attitude — keeps up withemerging model compression and inference technologies .What You’ll Do Take ownership ofend-to-end optimization lifecycle— from profiling bottlenecks to delivering production-optimized LLMs. Developcustom inference pipelinescapable of high throughput and low latency under real-world traffic. Build and maintaininternal libraries, wrappers, and benchmarking suitesfor continuous performance evaluation.What you will bring Hands-on experience in building, optimizing machine learning or Agentic Systemsat scale. A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges. Startup DNA→ bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.Why Join Us Shape afirst-of-its-kind AI + clean energy platform. Work with a small, mission-driven team obsessed with impact. An aggressive growth path. A chance to leave your mark at the intersection ofAI and sustainability.
-
Delhi, India Sustainability Economics.ai Full timeLocation: Bengaluru, KarnatakaAbout the Company:Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive...
-
Senior LLM Engineer
4 weeks ago
New Delhi, India RingCentral Full timeJob Description:We are seeking an experienced AI Engineer with a strong background in Natural Language Understanding (NLU) who is passionate about pushing the boundaries of Conversational AI. In this role, you will design, develop, and deploy scalable AI solutions leveraging LLMs, Retrieval-Augmented Generation (RAG), and prompt engineering techniques to...
-
New Delhi, India Mobileum Full timeAbout Us:Mobileum is a leading provider of Telecom analytics solutions for roaming, core network, security, risk management, domestic and international connectivity testing, and customer intelligence. More than 1,000 customers rely on its Active Intelligence platform, which provides advanced analytics solutions, allowing customers to connect deep network and...
-
Distinguished LLM Engineer
4 weeks ago
New Delhi, India Trident Consulting Full timeTrident Consulting is looking for a " Distinguished LLM Engineer - Chennai/ Tirunelveli/ Coimbatore" .Role: Distinguished LLM Engineer Location: Chennai/ Tirunelveli/ Coimbatore Type: Fulltime Salary:Depends on your experience and the current market rateDo you want to use your AI expertise to drive real-world impact? We’re hiring aDistinguished LLM...
-
Generative AI Engineer
4 weeks ago
New Delhi, India DIGI9 Full timeJob Description: Senior AI Engineer – VoizPanda (AI Multi-Calling Platform)Location: BasaveshwaraNagar,BangaloreMode: Onsite (Full-time)Experience: 2–7 yearsCompensation: ₹6–9 LPAAbout VoizPandaVoizPanda is Digi9’s in-house AI-powered multi-calling platform that enables enterprises to run thousands of simultaneous voice calls handled entirely by...
-
Lead AI Engineer – LLM
5 days ago
New Delhi, India Senzcraft Full timeAbout Senzcraft:Founded by IIM Bangalore and IEST Shibpur Alumni, Senzcraft is a hyper-automation company. Senzcraft vision is to Radically Simplify Today's Work. And Design Business Process For The Future. Using intelligent process automation technologies.We have a suite of SaaS products and services, partnering with automation product companies.Please...
-
Innefu Labs
3 weeks ago
Delhi Division, India Innefu Labs Pvt. Ltd. Full timeAbout the job :About the Company :Founded in 2010, Innefu is an AI-driven R&D company focused on cutting-edge Data Analytics and Information Security solutions.With over 100 installations across the Indian Subcontinent, Middle East, and Southeast Asia, we provide AI-powered solutions to defense, law enforcement, financial institutions, and Fortune 500...
-
DevOps Engineer
4 weeks ago
New Delhi, India Tipstat® Full timeWe are looking for a highly skilled DevOps Engineer with strong experience in DevSecOps and MLOps / LLMOps to design, automate, and secure our development and deployment pipelines.You will play a critical role in building scalable, secure, and production-ready infrastructure to support both traditional applications and machine learning / LLM workloads.This...
-
Technical Team Lead – LLM Systems
5 days ago
New Delhi, India Balbix Full timeWHO WE ARE Balbix is the world's leading platform for cybersecurity posture automation. Using Balbix, organizations can discover, prioritize and mitigate unseen risks and vulnerabilities at high velocity. With seamless data collection and petabyte-scale analysis capabilities, Balbix is deployed and operational within hours, and helps to decrease breach risk...
-
AI/ML Engineer
1 week ago
New Delhi, India RingCentral Full timeJob Description:We are seeking an experienced AI Engineer with a strong background in Natural Language Understanding (NLU) who is passionate about pushing the boundaries of Conversational AI. In this role, you will design, develop, and deploy scalable AI solutions leveraging LLMs, Retrieval-Augmented Generation (RAG), and prompt engineering techniques to...