GPU Optimization Engineer

16 hours ago


bangalore, India Taglynk Full time

Role We’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific tuning, and porting models across GPU architectures. Your work directly impacts the latency, throughput, and reliability of smallest’s real-time speech models. What You’ll Do Optimize model architectures (ASR, TTS, SLMs) for maximum performance on specific GPU hardware Profile models end-to-end to identify GPU bottlenecks — memory bandwidth, kernel launch overhead, fusion opportunities, quantization constraints Design and implement custom kernels (CUDA/Triton/Tinygrad) for performance-critical model sections Perform operator fusion, graph optimization, and kernel-level scheduling improvements Tune models to fit GPU memory limits while maintaining quality Benchmark and calibrate inference across NVIDIA, AMD, and potentially emerging accelerators Port models across GPU chipsets (NVIDIA → AMD / edge GPUs / new compute backends) Work with TensorRT, ONNX Runtime, and custom runtimes for deployment Partner with the research and infra teams to ensure the entire stack is optimized for real-time workloads Requirements Strong understanding of GPU architecture — SMs, warps, memory hierarchy, occupancy tuning Hands-on experience with CUDA , kernel writing, and kernel-level debugging Experience with kernel fusion and model graph optimizations Familiarity with TensorRT, ONNX, Triton, tinygrad, or similar inference engines Strong proficiency in PyTorch and Python Deep understanding of model architectures (transformers, convs, RNNs, attention, diffusion blocks) Experience profiling GPU workloads using Nsight, nvprof, or similar tools Strong problem-solving abilities with a performance-first mindset Great to Have Experience with quantization (INT8, FP8, hybrid formats) Experience with audio/speech models (ASR, TTS, SSL, vocoders) Contributions to open-source GPU stacks or inference runtimes Published work related to systems-level model optimization Who Will Succeed in This Role Someone who: thinks in kernels, not just layers knows which optimizations are theoretical vs practically impactful understands GPU boundaries (memory, bandwidth, latency) and how to work around them is excited by the challenge of ultra-low latency and large-scale real-time inference loves debugging at the CUDA + model level



  • Bangalore, India Taglynk Full time

    Role We’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific...


  • Bangalore Urban, India Taglynk Full time

    RoleWe’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific...


  • bangalore, India beBeeSoftware Full time

    Job OpportunityAs a Senior Software Architect, you will oversee the entire software ecosystem for a novel GPU backend.Design and implement the end-to-end architectural framework of the GPU software stack, from compilers to drivers to runtimes, for AI training/inference and advanced graphics.Collaborate closely with hardware architects and engineers to...


  • bangalore, India beBeeGpu Full time

    Job Title: Lead GPU/TPU Design EngineerWe are seeking a seasoned Senior IP/RTL Design Engineer to lead the design of high-performance TPU and GPU architectures.About this Role:This is a critical position for our team, as we aim to create next-generation AI accelerators.The ideal candidate will have extensive experience in ASIC/FPGA IP/RTL design and a deep...


  • Bangalore, Karnataka, India Qualcomm Full time

    Company Qualcomm India Private Limited Job Area Engineering Group Engineering Group Hardware Engineering General Summary Qualcomm GPU team is actively seeking candidates for several physical design engineering positions Graphics HW team in Bangalore is part of a worldwide team responsible for developing and delivering GPU solutions which are setting the...


  • Bangalore, India Mulya Technologies Full time

    Principal IP/RTL Design Engineer for TPU / GPU Hyderabad / Bangalore Founded by highly respected Silicon Valley veterans - with its design centers established in Santa Clara, California. / Hyderabad/ Bangalore Our pay comprehensively beats "ALL" Semiconductor product players in the Indian market. Position Overview Seeking an IP/RTL Design Engineer with 5+...


  • bangalore, India Mulya Technologies Full time

    Principal IP/RTL Design Engineer for TPU / GPU Hyderabad / Bangalore Founded by highly respected Silicon Valley veterans - with its design centers established in Santa Clara, California. / Hyderabad/ Bangalore Our pay comprehensively beats "ALL" Semiconductor product players in the Indian market. Position Overview Seeking an IP/RTL Design Engineer with 5+...


  • Bangalore Division, India Best NanoTech Full time

    About the Company- Undisputed leader in AI computing Our client is the world’s leading pioneer in accelerated computing . Originally known for inventing the GPU and revolutionizing gaming, they are now the primary force powering the AI era , providing the infrastructure for everything from self-driving cars to ChatGPT. You will be joining a trillion-dollar...


  • bangalore, India Mulya Technologies Full time

    IP/RTL Design Architect for GPU Hyderabad / Bangalore Founded by highly respected Silicon Valley veterans - with its design centers established in Santa Clara, California. / Hyderabad/ Bangalore Our pay comprehensively beats "ALL" Semiconductor product players in the Indian market. Position Overview Seeking an IP/RTL Design Engineer with 8+ years of...


  • bangalore, India DC Tech Consulting Full time

    Senior Data Center Engineer – AI/ML & GPU PlatformsLocation: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration...