All roles

[Remote] Senior AI Kernel Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack. They are seeking a Senior AI Kernel Engineer to lead the design and optimization of high-performance kernels for AI inference on GPUs and custom accelerators, collaborating closely with various teams to enhance performance.

Responsibilities

  • Design, implement, and optimize performance-critical kernels for AI inference workloads (e.g., GEMM, attention, communication, fusion)
  • Lead kernel-level optimization efforts across single-GPU, multi-GPU, and heterogeneous hardware environments
  • Make informed trade-offs between latency, throughput, memory footprint, and numerical precision
  • Drive adoption of new hardware features (e.g., Tensor Cores, asynchronous execution, advanced memory spaces)
  • Analyze performance using profilers, hardware counters, and microbenchmarks; translate insights into concrete improvements
  • Work closely with compiler and runtime teams to influence code generation, scheduling, and kernel fusion strategies
  • Review and mentor other engineers on kernel design, performance tuning, and best practices
  • Contribute to technical roadmaps and long-term performance strategy for AI inference

Skills

  • 5+ years of experience in performance-critical systems or kernel development (or equivalent depth of expertise)
  • Strong proficiency in C/C++ and low-level programming
  • Extensive hands-on experience with GPU kernel programming (CUDA, HIP, or equivalent)
  • Deep understanding of GPU architecture, including memory hierarchies, synchronization, and execution models
  • Proven track record of delivering measurable performance improvements in production systems
  • Strong problem-solving skills and ability to work independently on complex, ambiguous performance challenges
  • Experience with PTX, assembly-level tuning, or code generation frameworks (e.g., Triton)
  • Experience optimizing distributed or multi-GPU inference pipelines
  • Familiarity with custom AI accelerators or domain-specific hardware
  • Understanding of modern AI models (e.g., transformers, LLMs, diffusion) from a systems and performance perspective
  • Contributions to open-source kernel libraries, compilers, or performance tools
  • Experience collaborating directly with hardware or compiler teams

Benefits

  • Premier insurance plans
  • Up to 5% 401k matching
  • Flexible paid time off
  • Stock options
  • Annual target bonus
  • Equity
  • Team Building Events
  • Regular team onsites and local meetups in Los Altos, CA as well as different cities
  • Traveling 2-4 times a year is expected for all roles

Company Overview

  • Modular provides AI infrastructure for deployment, serving, and programming GPUs. It was founded in 2022, and is headquartered in Palo Alto, California, USA, with a workforce of 51-200 employees. Its website is https://www.modular.com.
  • Company H1B Sponsorship

  • Modular has a track record of offering H1B sponsorships, with 3 in 2026, 10 in 2025, 6 in 2024, 8 in 2023, 4 in 2022. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles