Inference optimization engineer(llm and runtime)

1 week ago


Kurnool, India Sustainability Economics.ai Full time

Location:  Bengaluru, Karnataka      About the Company:       Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.      Role Summary:    We are seeking a highly skilled and innovative Inference Optimization (LLM and Runtime)  to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.      Key Tasks and Accountability:      Optimization and customization  of large-scale generative models (LLMs) for efficient inference and serving.   Apply and evaluate advanced model optimization techniques  such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance.   Implement custom fine-tuning pipelines  using parameter-efficient methods (Lo RA, QLo RA, adapters etc.) to achieve task-specific goals while minimizing compute overhead.   Optimize runtime performance  of inference stacks using frameworks like v LLM, Tensor RT-LLM, Deep Speed-Inference, and Hugging Face Accelerate.   Design and implement scalable model-serving architectures  on GPU clusters and cloud infrastructure (AWS, GCP, or Azure).   Work closely with platform and infrastructure teams to reduce latency, memory footprint, and cost-per-token  during production inference.   Evaluate hardware–software co-optimization strategies  across GPUs (NVIDIA A100/H100), TPUs, or custom accelerators.   Monitor and profile performance using tools such as Nsight, Py Torch Profiler, and Triton Metrics  to drive continuous improvement.     Key Requirements:      Education & Experience     Ph. D. in Computer Science  or a related field, with a specialization in Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML) .  2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.     Skills      Strong analytical and mathematical reasoning ability with a focus on measurable performance gains.   Collaborative mindset, with ability to work across research, engineering, and product teams.   Pragmatic problem-solver who values efficiency, reproducibility, and maintainable code  over theoretical exploration.   Curiosity-driven attitude — keeps up with emerging model compression and inference technologies .      What You’ll Do     Take ownership of end-to-end optimization lifecycle  — from profiling bottlenecks to delivering production-optimized LLMs.   Develop custom inference pipelines  capable of high throughput and low latency under real-world traffic.   Build and maintain internal libraries, wrappers, and benchmarking suites  for continuous performance evaluation.      What you will bring      Hands-on experience in building, optimizing machine learning or Agentic Systems  at scale.  A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges.  Startup DNA  → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.        Why Join Us      Shape a  first-of-its-kind AI + clean energy platform .   Work with a small, mission-driven team obsessed with impact.   An aggressive growth path.   A chance to leave your mark at the intersection of  AI and sustainability .   



  • Kurnool, India Scienaptic AI Full time

    Job Title: Lead Platform Engineer – Agentic SystemLocation: BengaluruDepartment: Engineering – AI PlatformExperience: 8+ years (with a minimum of 2 years of hands-on experience in designing, architecting production grade Agentic AI and LLM-based systems with all NFRs covered.)About the RoleOur organization is developing an Agentic AI Platform designed to...


  • Kurnool, India Scienaptic AI Full time

    Our organization is developing an Agentic AI Platform designed to orchestrate intelligent, autonomous workflows that drive decision-making, automation, and innovation across diverse domains.As the Lead AI Platform Engineer, you will be responsible for architecting and implementing the foundational elements of this platform. This includes designing, deploying...


  • Kurnool, India Scienaptic AI Full time

    Our organization is developing an Agentic AI Platform designed to orchestrate intelligent, autonomous workflows that drive decision-making, automation, and innovation across diverse domains.As the Lead AI Platform Engineer, you will be responsible for architecting and implementing the foundational elements of this platform. This includes designing, deploying...

  • Ai/ml & data engineer

    4 weeks ago


    Kurnool, India Mindfire Solutions Full time

    About the Job We are looking for an experienced AI/ML & Data Engineer to design, develop, and deploy scalable machine learning models and data infrastructure on AWS. You will work closely with cross-functional teams to deliver AI-driven solutions, integrate large language models (LLMs), and optimize data workflows while ensuring security, scalability, and...


  • Kurnool, India Mantras2Success.com Full time

    Designation: AI/ML Engineer Location: GandhinagarExperience Range: 4+ yearsSalary Range: - upto 15 LPAJob Profile:We’re looking for an AI/ML Engineer with hands-on expertise in LLMs, Agentic AI, NLP and some Computer Vision. You’ll build scalable AI solutions using transformer-based models, LLM-powered chat systems, and vector databases, primarily in...


  • Kurnool, India Mantras2Success.com Full time

    Designation: AI/ML Engineer Location: GandhinagarExperience Range: 4+ yearsSalary Range: - upto 15 LPAJob Profile:We’re looking for an AI/ML Engineer with hands-on expertise in LLMs, Agentic AI, NLP and some Computer Vision. You’ll build scalable AI solutions using transformer-based models, LLM-powered chat systems, and vector databases, primarily in...


  • Kurnool, India People Prime Worldwide Full time

    About Client: Our Client is a global IT services company headquartered in Southborough, Massachusetts, USA. Founded in 1996, with a revenue of $1.8 B, with 35,000+ associates worldwide, specializes in digital engineering, and IT services company helping clients modernize their technology infrastructure, adopt cloud and AI solutions, and accelerate...


  • Kurnool, India Recro Full time

    🎙️Hiring: ML Research Engineer (ASR & Fine-tuning Specialist) 📍 Location: Bangalore | 🧠 Experience: 2+ Years | 💼 Full-time | On-site 🧩 What You’ll Do 🎯 Train & fine-tune ASR models (Whisper, Wav2Vec2, Conformer) for multilingual, healthcare-focused speech data. 🧠 Build, optimize, and fine-tune NLP components (Intent, NER, Entity...


  • Kurnool, India Crum & Forster Full time

    The Company Crum & Forster (C&F), with a proud history dating to 1822, provides specialty and standard commercial lines insurance products through our admitted and surplus lines insurance companies.Approaching $ 6 billion in written premium as of 2024, C&F enjoys an “A+” (Excellent) financial strength rating by A.M. Best.C&F is part of Fairfax Financial...

  • Data Scientist

    2 weeks ago


    Kurnool, India Turing Full time

    Role Overview:Join our global AI team and help shape the future of intelligent systems. We’re looking for Data Scientists and Analysts skilled in Python who love solving tough problems, building smarter models, and turning data into real-world impact. Work with top US-based companies creating cutting-edge AI and ML solutions—from fine-tuning LLMs to...