Inference Optimization Engineer

3 days ago


Tumkūr, India Sustainability Economics.ai Full time

Location: Bengaluru, Karnataka About the Company: Sustainability Economics.Ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion. Role Summary: We are seeking a highly skilled and innovative Inference Optimization (LLM and Runtime) to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation. Key Tasks and Accountability: - Optimization and customization of large-scale generative models (LLMs) for efficient inference and serving. - Apply and evaluate advanced model optimization techniques such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance. - Implement custom fine-tuning pipelines using parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead. - Optimize runtime performance of inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate. - Design and implement scalable model-serving architectures on GPU clusters and cloud infrastructure (AWS, GCP, or Azure). - Work closely with platform and infrastructure teams to reduce latency, memory footprint, and cost-per-token during production inference. - Evaluate hardware–software co-optimization strategies across GPUs (NVIDIA A100/H100), TPUs, or custom accelerators. - Monitor and profile performance using tools such as Nsight, PyTorch Profiler, and Triton Metrics to drive continuous improvement. Key Requirements: Education & Experience - Ph.D. in Computer Science or a related field, with a specialization in Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML). - 2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work. Skills - Strong analytical and mathematical reasoning ability with a focus on measurable performance gains. - Collaborative mindset, with ability to work across research, engineering, and product teams. - Pragmatic problem-solver who values efficiency, reproducibility, and maintainable code over theoretical exploration. - Curiosity-driven attitude — keeps up with emerging model compression and inference technologies. What You’ll Do - Take ownership of end-to-end optimization lifecycle — from profiling bottlenecks to delivering production-optimized LLMs. - Develop custom inference pipelines capable of high throughput and low latency under real-world traffic. - Build and maintain internal libraries, wrappers, and benchmarking suites for continuous performance evaluation. What you will bring - Hands-on experience in building, optimizing machine learning or Agentic Systems at scale. - A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges. - Startup DNA → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset. Why Join Us - Shape a first-of-its-kind AI + clean energy platform. - Work with a small, mission-driven team obsessed with impact. - An aggressive growth path. - A chance to leave your mark at the intersection of AI and sustainability.



  • Tumkūr, India Talentoj Full time

    Role Purpose: You will work with a leading technology organization focused on building highly scalable, high-performance ML model-serving infrastructure. As a Software Development Engineer in the MLOps team, you will design and develop robust systems that enable efficient, reliable, and optimized model deployment at scale. Role Value: As a Software Engineer...

  • Senior Ai/Ml Engineer

    2 weeks ago


    Tumkūr, India RingCentral Full time

    Job Description: We are seeking an experienced AI Engineer with a strong background in Natural Language Understanding (NLU) who is passionate about pushing the boundaries of Conversational AI. In this role, you will design, develop, and deploy scalable AI solutions leveraging LLMs, Retrieval-Augmented Generation (RAG), and prompt engineering techniques to...


  • Tumkūr, India HCLTech Full time

    Job Title: ML Ops Engineer / ML Engineer Experience - 5Yrs -20 Yrs Location - Chennai / Bangalore / Hyderabad / Pune / Noida / Mumbai/ Job Overview: We are looking for an experienced MLOps Engineer to help deploy, scale, and manage machine learning models in production environments. You will work closely with data scientists and engineering teams to automate...


  • Tumkūr, India Sustainability Economics.ai Full time

    Location: Bengaluru, Karnataka About the Company: Sustainability Economics.Ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive...


  • Tumkūr, India OpEase Full time

    Machine Learning Engineer (2D→3D Reconstruction & Workflow Intelligence) OpEase Technologies builds a high-precision, web-based surgical planning platform for orthopedic and spine surgeons. Doctors use OpEase to securely store patient data, upload X-rays, calibrate, measure, and plan surgeries through advanced geometry tools and clinical logic. Role...


  • Tumkūr, India Equicom Technologies Full time

    Company Description Equicom Technologies combines financial expertise with advanced technological innovation to develop state-of-the-art software solutions for the finance industry. Focused on driving transformation, the company delivers intelligent, secure, and scalable technologies to meet the complex demands of modern finance. By leveraging data-driven...

  • Staff Data Scientist

    2 weeks ago


    Tumkūr, India Auxia Full time

    Auxia is building the Agentic Customer Journey Orchestration Platform, redefining how enterprises activate, engage, and retain their customers through intelligent, adaptive AI systems. Backed by $23.5M in funding from top-tier investors — VMG Technology Partners, Stage 2 Capital, and MUFG Innovation Partners — we’re on a mission to make every...


  • Tumkūr, India apna Full time

    Job Title: Senior Security Engineer (Sr.SE ) Location: Bengaluru Employment Type: Full-time Team: Security Engineering Role Overview As a Senior Security Engineer, you will play a key role in strengthening the company’s overall security posture across our AI platforms, microservices, data pipelines and mobile/web products. You will design, build and...

  • Data Engineer

    2 weeks ago


    Tumkūr, India Zilo AI Full time

    Company Description Zilo AI is a prominent manpower service provider dedicated to connecting businesses with highly skilled and dependable professionals across various industries. With a deep understanding that talent drives success, Zilo AI is committed to supplying the right expertise to support business growth. Our focus on talent optimization ensures...

  • Ai Engineer

    2 weeks ago


    Tumkūr, India Workfall Full time

    We are looking for an experienced AI/LLM Engineer to design, build, and maintain intelligent applications powered by Large Language Models (LLMs), embeddings, similarity search, and vector databases. The ideal candidate will work on building real-time AI systems such as chatbots, semantic search, recommendation systems, document intelligence, and autonomous...