Data Scientist

10 hours ago


Kakinada, India FPX AI Full time
Role Overview

FPX is building an AI infrastructure marketplace that enables developers to access and deploy compute efficiently. As a PyTorch + CUDA Engineer focused on benchmarking and performance, you will be responsible for designing, running, and interpreting benchmarks across model, framework, and hardware stacks. Your role is critical in validating performance claims, detecting regressions, and guiding optimizations to ensure FPX remains at the cutting edge of compute efficiency.

You will collaborate closely with ML systems, compiler, hardware, and platform teams. You should have strong experience in PyTorch internals, GPU programming, and profiling tools.


Key Responsibilities
  • Define, build, and maintain a benchmark suite covering representative deep learning workloads (training, inference, mixed) across modalities (vision, NLP, recommendation, etc.).
  • Automate running of benchmarks across multiple hardware configurations (NVIDIA GPUs, possibly AMD, and future accelerators).
  • Use profiling, tracing, and performance tools (e.g. Nsight Systems, Nsight Compute, PyTorch Profiler, CUPTI, NVTX) to identify bottlenecks across layers (operator, kernel, memory, data movement).
  • Write and maintain scripts / harnesses that manage benchmark orchestration, result collection, and analysis (latency, throughput, memory usage, utilization metrics).
  • Detect and triage performance regressions (e.g. nightly, CI-integrated benchmarks).
  • Partner with compiler / runtime / kernel teams to propose optimizations, micro-bench kernel patches, fusion, operator-level improvements, or configuration tuning.
  • Validate performance improvements across scale (multi-GPU, distributed) and in production-like settings.
  • Publish benchmark results, document methodology, and communicate trade-offs to stakeholders (engineering, product, customers).
  • Occasionally assist in custom kernel development when needed (e.g. fused kernels, optimized CUDA code) or integrating specialized libraries (Triton, CUTLASS, cuBLAS, cuDNN).
  • Stay up-to-date on new features in PyTorch (e.g. torch.compile, CUDA Graphs, new backends) and evaluate their impact.


Required Qualifications
  • BS / MS / PhD in Computer Science, Electrical Engineering, or equivalent experience.
  • Solid experience (3+ years) in GPU programming: CUDA, kernel development, memory management, concurrency.
  • Deep familiarity with PyTorch internals (operators, autograd, dispatcher, JIT/inductor pipeline or equivalent).
  • Experience with profiling and analysis of GPU workloads (Nsight, CUPTI, NVTX, PyTorch Profiler).
  • Strong Python and C++ skills.
  • Ability to analyze low-level performance (latency, throughput, memory, occupancy) and correlate to high-level model behavior.
  • Experience writing benchmark harnesses, automation, and result pipelines.
  • Excellent communication skills — able to present performance trade-offs and complex analysis to technical and non-technical audiences.


Preferred / Nice-to-Have
  • Experience with distributed training/inference (DDP, FSDP, model parallelism).
  • Experience with PyTorch’s newer compilation pathways (e.g. torch.compile, Inductor, Dynamo).
  • Knowledge of CUDA Graphs, kernel fusion, memory optimizations, tensor core usage.
  • Experience with other ML frameworks and baselining comparisons (TensorFlow, JAX, ONNX).
  • Published benchmarks, open-source contributions, or performance tools development.
  • Prior experience in systems, compilers, or GPU runtime development.
  • Familiarity with scaling benchmarks, cluster deployments, and heterogeneous hardware.


Compensation
  • Competitive salary + equity + benefits.
  • Potential for bonuses tied to performance improvements and critical benchmark delivery.

  • Domo BI

    7 days ago


    Kakinada, India EXL Full time

    We are looking for a talented Data Engineer with expertise in SAS and Google Cloud Platform (GCP) to join our team. The ideal candidate will have a strong background in the Credit Risk domain and will be responsible for designing, building, and optimising data pipelines and architectures to support credit risk analysis and reporting. Key Responsibilities:...


  • Kakinada, India Taskify Full time

    SummaryWe are seeking a talented Software Test Engineer to join our engineering team and drive the design, development, and deployment of AI-powered solutions. This role is ideal for professionals passionate about artificial intelligence, machine learning, and software engineering who want to make a tangible impact on real-world applications.As a Software...


  • Kakinada, India Green HR Solutions Full time

    Hiring for USA based MNC,We are looking for a highly analytical and detail-oriented Quantitative Researcher to design, develop, and implement quantitative models that support trading, investment, and risk management strategies. The role involves conducting statistical analysis, researching financial markets, and working closely with portfolio managers,...

  • AI Architect

    7 days ago


    Kakinada, India Live Connections Full time

    Role - Technical ArchitectExperience - 15+ yearsBudget - Up to 55 LPA including 15% variablesWork Location - RemoteRequired Notice Period - Immediate to 30 days onlyMust Have15+ years of overall experienceHands-on experience in the following skillsDesign Architecture, RHL, AI ArchitectureKey Responsibilities:Design and Oversee AI Architecture: Develop the...


  • Kakinada, India Neuralbits Technologies Full time

    We are seeking a Clinician (MBBS/MD/Physio, etc.) or Biomedical Scientist/Engineer who can help us test, validate, and conduct clinical trials for our AI-powered electronics and healthcare products.ResponsibilitiesConduct clinical testing and validation of medical electronics products.Design and manage clinical trials, ensuring compliance with ethics and...