Inference Optimization Engineer(LLM and Runtime)

2 weeks ago


Bangalore, India Sustainability Economics.ai Full time

Location:  Bengaluru, Karnataka   About the Company: Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.   Role Summary: We are seeking a highly skilled and innovative  Inference Optimization (LLM and Runtime)  to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.   Key Tasks and Accountability:   Optimization and customization  of large-scale generative models (LLMs) for efficient inference and serving.   Apply and evaluate advanced  model optimization techniques  such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance.   Implement  custom fine-tuning pipelines  using parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead.   Optimize  runtime performance  of inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate.   Design and implement  scalable model-serving architectures  on GPU clusters and cloud infrastructure (AWS, GCP, or Azure).   Work closely with platform and infrastructure teams to reduce  latency, memory footprint, and cost-per-token  during production inference.   Evaluate  hardware–software co-optimization strategies  across GPUs (NVIDIA A100/H100), TPUs, or custom accelerators.   Monitor and profile performance using tools such as  Nsight, PyTorch Profiler, and Triton Metrics  to drive continuous improvement.   Key Requirements:   Education & Experience Ph.D. in  Computer Science  or a related field, with a specialization in  Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML) .  2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.  Skills Strong analytical and mathematical reasoning ability with a focus on measurable performance gains.   Collaborative mindset, with ability to work across research, engineering, and product teams.   Pragmatic problem-solver who values  efficiency, reproducibility, and maintainable code  over theoretical exploration.   Curiosity-driven attitude — keeps up with  emerging model compression and inference technologies .   What You’ll Do Take ownership of  end-to-end optimization lifecycle  — from profiling bottlenecks to delivering production-optimized LLMs.   Develop  custom inference pipelines  capable of high throughput and low latency under real-world traffic.   Build and maintain  internal libraries, wrappers, and benchmarking suites  for continuous performance evaluation.   What you will bring   Hands-on experience in building, optimizing machine learning or Agentic Systems at scale.  A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges.  Startup DNA → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.   Why Join Us Shape a  first-of-its-kind AI + clean energy platform .   Work with a small, mission-driven team obsessed with impact.   An aggressive growth path.   A chance to leave your mark at the intersection of  AI and sustainability . 



  • bangalore district, India Sustainability Economics.ai Full time

    Location:  Bengaluru, Karnataka   About the Company: Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that...


  • bangalore, India Mulya Technologies Full time

    Principal Machine Learning Engineer - Multimodal AI & InferenceBangaloreFounded in 2023,by Industry veterans HQ in California,USWe are revolutionizing sustainable AI compute through intuitive software with composable silicon Overview:You will design, optimize, and deploy large multimodal models (language, vision, audio, video) to run efficiently on a...


  • bangalore, India Mulya Technologies Full time

    Principal Machine Learning Engineer - Multimodal AI & Inference Bangalore Founded in 2023,by Industry veterans HQ in California,US We are revolutionizing sustainable AI compute through intuitive software with composable silicon Overview: You will design, optimize, and deploy large multimodal models (language, vision, audio, video) to run efficiently on a...


  • Bangalore Division, India Mulya Technologies Full time

    Principal Machine Learning Engineer - Multimodal AI & Inference Bangalore Founded in 2023,by Industry veterans HQ in California,US We are revolutionizing sustainable AI compute through intuitive software with composable silicon Overview: You will design, optimize, and deploy large multimodal models (language, vision, audio, video) to run efficiently on a...

  • Senior AI

    1 week ago


    bangalore, India CareerXperts Consulting Full time

    Our Client - We're pioneering a fundamental shift in cybersecurity—moving organizations from fragmented, reactive defense to unified, proactive protection. Our AI-powered platform synthesizes intelligence from 150+ disparate security tools, transforming overwhelming noise into crystal-clear risk prioritization through breakthrough predictive technology...

  • Senior AI

    5 days ago


    bangalore, India CareerXperts Consulting Full time

    Our Client - We're pioneering a fundamental shift in cybersecurity—moving organizations from fragmented, reactive defense to unified, proactive protection. Our AI-powered platform synthesizes intelligence from 150+ disparate security tools, transforming overwhelming noise into crystal-clear risk prioritization through breakthrough predictive technology...


  • Bangalore, India Taglynk Full time

    Role We’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific...


  • bangalore, India Taglynk Full time

    Role We’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific...

  • Senior AI

    1 week ago


    bangalore, India CareerXperts Consulting Full time

    Our Client - We're pioneering a fundamental shift in cybersecurity—moving organizations from fragmented, reactive defense to unified, proactive protection. Our AI-powered platform synthesizes intelligence from 150+ disparate security tools, transforming overwhelming noise into crystal-clear risk prioritization through breakthrough predictive technology...

  • AI Runtime Engineer

    1 week ago


    bangalore, India Capgemini Engineering Full time

    AI Runtime EngineerLocation: Bangalore (or as per requirement)Experience: 7+ yearsChoosing Capgemini means joining a team where you'll be empowered to build cutting-edge AI infrastructure, supported by a collaborative global community, and inspired to reimagine what's possible. Join us in enabling scalable, fault-tolerant AI systems that power...