GPU Optimization Engineer
16 hours ago
Role We’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific tuning, and porting models across GPU architectures. Your work directly impacts the latency, throughput, and reliability of smallest’s real-time speech models. What You’ll Do Optimize model architectures (ASR, TTS, SLMs) for maximum performance on specific GPU hardware Profile models end-to-end to identify GPU bottlenecks — memory bandwidth, kernel launch overhead, fusion opportunities, quantization constraints Design and implement custom kernels (CUDA/Triton/Tinygrad) for performance-critical model sections Perform operator fusion, graph optimization, and kernel-level scheduling improvements Tune models to fit GPU memory limits while maintaining quality Benchmark and calibrate inference across NVIDIA, AMD, and potentially emerging accelerators Port models across GPU chipsets (NVIDIA → AMD / edge GPUs / new compute backends) Work with TensorRT, ONNX Runtime, and custom runtimes for deployment Partner with the research and infra teams to ensure the entire stack is optimized for real-time workloads Requirements Strong understanding of GPU architecture — SMs, warps, memory hierarchy, occupancy tuning Hands-on experience with CUDA , kernel writing, and kernel-level debugging Experience with kernel fusion and model graph optimizations Familiarity with TensorRT, ONNX, Triton, tinygrad, or similar inference engines Strong proficiency in PyTorch and Python Deep understanding of model architectures (transformers, convs, RNNs, attention, diffusion blocks) Experience profiling GPU workloads using Nsight, nvprof, or similar tools Strong problem-solving abilities with a performance-first mindset Great to Have Experience with quantization (INT8, FP8, hybrid formats) Experience with audio/speech models (ASR, TTS, SSL, vocoders) Contributions to open-source GPU stacks or inference runtimes Published work related to systems-level model optimization Who Will Succeed in This Role Someone who: thinks in kernels, not just layers knows which optimizations are theoretical vs practically impactful understands GPU boundaries (memory, bandwidth, latency) and how to work around them is excited by the challenge of ultra-low latency and large-scale real-time inference loves debugging at the CUDA + model level
-
GPU Optimization Engineer
21 hours ago
Bangalore, India Taglynk Full timeRole We’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific...
-
GPU Optimization Engineer
24 hours ago
Bangalore Urban, India Taglynk Full timeRoleWe’re hiring a GPU Optimization Engineer who understands GPUs at a deep, architectural level — someone who knows exactly how to squeeze every last millisecond out of a model, what GPU constraints matter, and how to restructure models for real-world inference performance. You’ll work across CUDA kernels, model graph optimizations, hardware-specific...
-
Chief GPU Software Designer
1 week ago
bangalore, India beBeeSoftware Full timeJob OpportunityAs a Senior Software Architect, you will oversee the entire software ecosystem for a novel GPU backend.Design and implement the end-to-end architectural framework of the GPU software stack, from compilers to drivers to runtimes, for AI training/inference and advanced graphics.Collaborate closely with hardware architects and engineers to...
-
Lead GPU/TPU Design Engineer
1 week ago
bangalore, India beBeeGpu Full timeJob Title: Lead GPU/TPU Design EngineerWe are seeking a seasoned Senior IP/RTL Design Engineer to lead the design of high-performance TPU and GPU architectures.About this Role:This is a critical position for our team, as we aim to create next-generation AI accelerators.The ideal candidate will have extensive experience in ASIC/FPGA IP/RTL design and a deep...
-
Gpu Physical Design Engineer
2 weeks ago
Bangalore, Karnataka, India Qualcomm Full timeCompany Qualcomm India Private Limited Job Area Engineering Group Engineering Group Hardware Engineering General Summary Qualcomm GPU team is actively seeking candidates for several physical design engineering positions Graphics HW team in Bangalore is part of a worldwide team responsible for developing and delivering GPU solutions which are setting the...
-
Principal Ip/Rtl Design Engineer For Gpu
6 days ago
Bangalore, India Mulya Technologies Full timePrincipal IP/RTL Design Engineer for TPU / GPU Hyderabad / Bangalore Founded by highly respected Silicon Valley veterans - with its design centers established in Santa Clara, California. / Hyderabad/ Bangalore Our pay comprehensively beats "ALL" Semiconductor product players in the Indian market. Position Overview Seeking an IP/RTL Design Engineer with 5+...
-
Principal IP/RTL Design Engineer for GPU
1 week ago
bangalore, India Mulya Technologies Full timePrincipal IP/RTL Design Engineer for TPU / GPU Hyderabad / Bangalore Founded by highly respected Silicon Valley veterans - with its design centers established in Santa Clara, California. / Hyderabad/ Bangalore Our pay comprehensively beats "ALL" Semiconductor product players in the Indian market. Position Overview Seeking an IP/RTL Design Engineer with 5+...
-
Senior GPU Compiler Engineer
3 weeks ago
Bangalore Division, India Best NanoTech Full timeAbout the Company- Undisputed leader in AI computing Our client is the world’s leading pioneer in accelerated computing . Originally known for inventing the GPU and revolutionizing gaming, they are now the primary force powering the AI era , providing the infrastructure for everything from self-driving cars to ChatGPT. You will be joining a trillion-dollar...
-
IP/RTL Design Architect for GPU
2 weeks ago
bangalore, India Mulya Technologies Full timeIP/RTL Design Architect for GPU Hyderabad / Bangalore Founded by highly respected Silicon Valley veterans - with its design centers established in Santa Clara, California. / Hyderabad/ Bangalore Our pay comprehensively beats "ALL" Semiconductor product players in the Indian market. Position Overview Seeking an IP/RTL Design Engineer with 8+ years of...
-
Senior Data Center Engineer – AI/ML
3 days ago
bangalore, India DC Tech Consulting Full timeSenior Data Center Engineer – AI/ML & GPU PlatformsLocation: RemoteExperience: 7+ YearsType: Full-timeRole OverviewWe are seeking a highly skilled Senior Data Center Compute Engineer to design, build, and operate GPU-enabled compute platforms for AI/ML and high-performance workloads. This role is heavily focused on Kubernetes, virtualization, orchestration...