Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff - Kernels & GPU Performance

Gimlet Labs

Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in power, capacity, and cost with today’s homogeneous, vertically integrated infrastructure. Gimlet addresses this by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across multi-vendor and multi-generation hardware, including the latest emerging accelerators. These systems unlock step-function improvements in performance and cost efficiency at scale. On top of this foundation, Gimlet is building a production-grade neocloud for agentic workloads. Customers use Gimlet to deploy and manage their workloads through stable, production-ready APIs, without having to reason about hardware selection, placement, or low-level performance optimization. Gimlet works with foundation labs, hyperscalers, and AI native companies to power real production workloads built to scale to gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behavior, design and optimize kernels, and ensure performance is reliable across both established and emerging hardware. This role is ideal for engineers who enjoy deep performance work, reasoning about hardware tradeoffs, and turning theoretical peak performance into real-world results. Responsibilities Design, implement, and optimize GPU and accelerator kernels for AI workloads Analyze and tune performance across the GPU execution stack, including memory access patterns, synchronization, and instruction scheduling Work with compilers and runtimes to ensure kernels integrate cleanly and perform well in end-to-end systems Bring up and optimize execution on new or emerging accelerators Profile, benchmark, and debug performance issues across kernels, runtimes, and hardware Ensure performance optimizations are robust, correct, and production-ready at scale Qualifications Strong software engineering fundamentals Experience working on performance-critical systems close to hardware Comfort reasoning about low-level execution behavior, memory hierarchies, and performance tradeoffs Preferred Qualifications Experience with CUDA, Triton, CUTLASS, or other accelerator programming models Deep understanding of GPU execution models (warps/wavefronts, blocks, grids) Experience optimizing memory access patterns (coalescing, shared memory, cache behavior) Familiarity with occupancy, latency hiding, and instruction-level parallelism Experience using profiling and performance analysis tools Familiarity with multi-GPU or distributed execution is a plus #J-18808-Ljbffr Gimlet Labs

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - Kernels & GPU Performance in San Francisco, CA vacancy
  • $225k

     ...compute to achieve this goal. About The Role As a Kernel Engineer, you will design, implement, and maintain high-performance kernels to optimize throughput and latency...  ...Blackwell or Google TPUs Develop and optimize GPU kernels in frameworks such as NCCL, MSCCLPP, CUTLASS... 
    Performance
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    5 days ago
  •  ...reliably than humans can alone. Our technical approach combines frontier-...  .... About the Role As a Kernel Engineer, you will design, implement, and maintain high-performance kernels that optimize throughput...  ...Experience developing and optimizing GPU or accelerator kernels using... 
    Performance
    Work at office
    Visa sponsorship
    Relocation package
    Flexible hours

    Acceler8 Talent

    San Francisco, CA
    1 day ago
  •  ...challenges and the wins. What You'll Do Bring deep kernel expertise to our AI agents that optimize high-performance, mission-critical computing systems. You'll shape...  ...or optimizing kernels for ML or other GPU-heavy workloads Fluency in Python and C/C++, and... 
    Performance
    Work at office
    Flexible hours

    Asari AI

    San Francisco, CA
    1 day ago
  •  ...Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence...  ...inference stack to improve core performance metrics by diving deep into model...  ...performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    4 days ago
  •  ...With a focused team, breakthrough performance doesn't require breakthrough compute...  ...that matter, and join the team. As a Member of Technical Staff with a focus on Multimodal AI, you will...  ...: Experience in writing efficient GPU kernels using CUDA, optimising performance... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    3 days ago
  • $150k - $300k

     ...training stack. Core Technical Responsibilities LLM Serving...  ...across our cloud GPU fleets. GPU‑Aware...  ...Inference Optimization & Performance Framework Development...  ...Performance: Profile kernels, memory bandwidth and...  ...development and encourage team members to contribute to the... 
    Performance
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime Intellect

    San Francisco, CA
    4 days ago
  • $150k - $300k

     ...distributed system with performance engineering at its...  ...skills, from deep Linux kernel topics to high-level distributed...  ...at scale. Core Technical Responsibilities...  ...heterogeneous hardware (CPU, GPU, TPU) Platform...  ...development and encourage team members to contribute to the... 
    Performance
    Full time
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Kubelt

    San Francisco, CA
    5 days ago
  •  ...The Role We're looking for a Member of Technical Staff - Data & ML Infrastructure Engineer...  ...regressions. You'll work across GPU kernels, inference systems, distributed training...  ...Production AI deployment Performance engineering This role emerged directly... 
    Performance

    Moonlake AI

    San Francisco, CA
    4 days ago
  •  ...efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels, fused ops. Flash-attention-style speedups, sequence packing, KV-cache tricks.... 
    Performance

    Embedding VC

    San Francisco, CA
    3 days ago
  •  ...optimize distributed inference systems on GPU clusters, pushing throughput with large-...  ...efficient low-level code (CUDA, Triton, custom kernels) and integrate it seamlessly into high-...  ...systems, ML infrastructure, or high-performance serving (8+ years) Production-grade expertise... 
    Performance
    Remote job

    Genesis AI

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step...  ...models Designing and implementing custom ML kernels Optimizing performance (latency, throughput, cost) Developing data pipelines... 
    Performance

    Mirendil

    San Francisco, CA
    1 day ago
  •  ...component to hardware that best fits its performance and efficiency needs. This approach...  .... Gimlet Labs is seeking a Member of Technical Staff focused on compilers. In this role, you...  ...spanning graph-level, tensor-level, and kernel-level representations Implement partitioning... 
    Performance

    Gimlet Labs

    San Francisco, CA
    2 days ago
  •  ...optimizes AI itself. Our journey starts with GPU kernels, but will expand into every corner of...  ...systems that help the agent diagnose performance bottlenecks Ship features that...  ...You're a strong fit if you: Have deep technical intuition and can learn new domains quickly... 
    Performance
    Remote work

    Wafer

    San Francisco, CA
    1 day ago
  •  ...the foundation model training stack, from data pipelines to GPU kernels Design, build, and optimize distributed training systems (PyTorch...  ...tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures What You’ll Bring Deep experience... 
    Performance
    Remote job

    Genesis AI

    San Francisco, CA
    4 days ago
  •  ...component to hardware that best fits its performance and efficiency needs. This approach...  .... Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference....  ...boundaries. Work closely with compilers, kernels, networking, and distributed systems... 
    Performance

    Gimlet Labs, Inc.

    San Francisco, CA
    4 days ago
  • $170k - $220k

    Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation:...  ...strong engineer to join a lean, high-performance team building next-generation inference...  ...directly on problems like: Scaling multi-GPU inference workloads Designing... 
    Performance
    Full time
    Temporary work
    Immediate start
    Visa sponsorship
    Work visa

    Amadeus Search

    San Francisco, CA
    5 days ago
  • $300k

    Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing...  ...and/or RL training. Experience with GPU clusters, distributed training, model...  ...observability, testing, debugging, and performance optimization. Ability to work closely... 
    Performance
    Work at office
    Local area

    Vmax

    San Francisco, CA
    5 days ago
  • $225k

     ...large-scale model training across massive GPU clusters. You will work at the boundary...  ...systems, ensuring that training runs are performant, reliable, and reproducible under extreme...  ...and training throughput Collaborate with Kernels and Research to align model architecture... 
    Performance
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    5 days ago
  •  ...Design, build, and operate large-scale GPU infrastructure for high-throughput model...  ...learning pipelines at scale. Build high-performance inference platforms capable of serving and...  ...Improve performance of model execution through kernel-level optimization, model parallelism... 
    Performance
    Relocation package

    Reflection

    San Francisco, CA
    2 days ago
  • $150k - $300k

     ...fine-tuning runs on managed GPU clusters with a single API call...  ...that runs the jobs. Core Technical Responsibilities Hosted Training...  ...: networking, namespaces, performance tuning Programming & Platform...  ...development and encourage team members to contribute to the broader... 
    Performance
    Work at office
    Local area
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    10 days ago
  •  ...possible. You will work directly with the technical lead on problems that require deep...  ...and directly impacts model performance on real devices. While San Francisco...  ...Work Implement and optimize inference kernels for CPU, NPU, and GPU architectures across diverse edge hardware... 
    Performance

    Liquid AI

    San Francisco, CA
    1 day ago
  •  ...orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems...  ...AI datacenters. Mission Gimlet Labs is seeking an Member of Staff focused on AI Research (Intern). As an AI Researcher (Intern... 
    Performance
    Internship

    Gimlet Labs

    San Francisco, CA
    4 days ago
  • $200k - $350k

     ...Member of ML Technical Staff Title of Role: Member of ML Technical Staff Location: San Francisco, onsite Company Stage of Funding...  ...continuous improvement of engineering practices. Analyze model performance and implement improvements based on quantitative metrics.... 
    Performance
    Work at office
    Visa sponsorship

    Recruiting from Scratch

    San Francisco, CA
    1 day ago
  • $150k - $350k

     ...Job Description Job Description Member of Technical Staff, Applied Research — Sieve Location: San Francisco, CA (Onsite) Compensation...  ...pure academic research. Core ownership includes: High-performance video understanding building blocks Large-scale video... 
    Performance
    Full time
    H1b
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    13 days ago
  • $200k

     ...Join to apply for the Member of Technical Staff role at Listen Labs . TL;DR: We are seeing strong market demand and an aggressive 6‑month product...  ...enterprise wins at Google, Microsoft, Nestlé, and P&G. Performance: 83% win rate on deals with no losses to competitors. Market... 
    Performance
    Flexible hours

    Listen Labs

    San Francisco, CA
    4 days ago
  • $220k

     ...scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to...  ...all Python pains and keep up with rapidly growing traffic. Performance optimisation. Profile and fix bottlenecks from network ingress... 
    Performance

    Perplexity

    San Francisco, CA
    4 days ago
  • $350k

     ...Neocloud Platform | On-site (San Francisco) We’re hiring a Member of Technical Staff - Distributed Systems to join a next-generation AI...  ...tolerance Instrument systems for monitoring, debugging, and performance at scale Collaborate across compilers, runtimes, and hardware... 
    Performance

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  • $150k - $350k

    Mission Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build the core platform that...  ...and hardware to ensure end‑to‑end system correctness and performance Qualifications Strong software engineering fundamentals... 
    Performance

    Gimlet Labs, Inc.

    San Francisco, CA
    2 days ago
  •  ...orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous...  ...to gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will... 
    Performance

    Gimlet Labs

    San Francisco, CA
    2 days ago
  •  ...infrastructure foundation for AI teams. With instant GPU access, sub-second container startups,...  ..., designing systems to measure performance, and translating results into product...  ...looking for the following: Sufficient technical skills to design and implement scalable... 
    Performance
    Work at office

    Modal

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Kernels & GPU Performance. Be the first to apply!