Member of Technical Staff - Kernels & GPU Performance

$150k - $350k

Gimlet Labs, Inc.

About Us Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in power, capacity, and cost with today’s homogeneous, vertically integrated infrastructure. Gimlet addresses this by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across multi-vendor and multi-generation hardware, including the latest emerging accelerators. These systems unlock step‑function improvements in performance and cost efficiency at scale. On top of this foundation, Gimlet is building a production‑grade neocloud for agentic workloads. Customers use Gimlet to deploy and manage their workloads through stable, production‑ready APIs, without having to reason about hardware selection, placement, or low‑level performance optimization. Gimlet works with foundation labs, hyperscalers, and AI native companies to power real production workloads built to scale to gigawatt‑class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low‑level execution behavior, design and optimize kernels, and ensure performance is reliable across both established and emerging hardware. This role is ideal for engineers who enjoy deep performance work, reasoning about hardware tradeoffs, and turning theoretical peak performance into real‑world results. Responsibilities Design, implement, and optimize GPU and accelerator kernels for AI workloads Analyze and tune performance across the GPU execution stack, including memory access patterns, synchronization, and instruction scheduling Work with compilers and runtimes to ensure kernels integrate cleanly and perform well in end‑to‑end systems Bring up and optimize execution on new or emerging accelerators Profile, benchmark, and debug performance issues across kernels, runtimes, and hardware Ensure performance optimizations are robust, correct, and production‑ready at scale Qualifications Strong software engineering fundamentals Experience working on performance‑critical systems close to hardware Comfort reasoning about low‑level execution behavior, memory hierarchies, and performance tradeoffs Preferred Qualifications Experience with CUDA, Triton, CUTLASS, or other accelerator programming models Deep understanding of GPU execution models (warps/wavefronts, blocks, grids) Experience optimizing memory access patterns (coalescing, shared memory, cache behavior) Familiarity with occupancy, latency hiding, and instruction‑level parallelism Experience using profiling and performance analysis tools Familiarity with multi‑GPU or distributed execution is a plus Compensation Range: $150K - $350K #J-18808-Ljbffr Gimlet Labs, Inc.

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff - Kernels & GPU Performance in San Francisco, CA vacancy

Member of Technical Staff - Kernels & GPU Performance
...orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables... ...gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to...
Performance
Gimlet Labs
San Francisco, CA
5 days ago
Member of Technical Staff, Kernels
Member of Technical Staff — Kernels & GPU Performance Employment Type: Full-time Workplace: On-site About the Company We are building the execution layer for the next era of AI infrastructure. As AI workloads scale and hardware architectures diversify, the bottleneck is...
Performance
Full time
Acceler8 Talent
San Francisco, CA
4 days ago
Member of Technical Staff, Kernels
$225k
...compute to achieve this goal. About The Role As a Kernel Engineer, you will design, implement, and maintain high-performance kernels to optimize throughput and latency... ...Blackwell or Google TPUs Develop and optimize GPU kernels in frameworks such as NCCL, MSCCLPP, CUTLASS...
Performance
Relocation
Visa sponsorship
Magic
San Francisco, CA
3 days ago
Member of Technical Staff - Low Level & Kernels Capabilities
...Machine Learning Engineers for our Low Level / Kernels Capabilities team. The Kernels team... ...at the lowest layers of the stack. Think GPU and accelerator kernels, vector ISAs, codec... ...a single design. Build correctness and performance scoring that's deterministic and can't be...
Performance
Visa sponsorship
Relocation package
Preference Model
San Francisco, CA
1 day ago
Member of Technical Staff - GPU Infrastructure
$150k - $300k
.... As our Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms customer requirements... ...workloads Implement high‑performance networking with InfiniBand, RoCE,... ...performance Tune system performance from kernel parameters to CUDA configurations...
Performance
Prime Intellect
San Francisco, CA
1 day ago
Member of Technical Staff, Kernels
$200k - $350k
...language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and improve the... ...multiplication, gating, and normalization, optimized for modern GPU architectures. Design compute primitives to reduce memory...
Performance
Immediate start
Flexible hours
Inception
San Francisco, CA
5 days ago
Member of Technical Staff - Kernel Engineer
$150k - $250k
...challenges and the wins. What You’ll Do Bring deep kernel expertise to our AI agents that optimize high-performance, mission-critical computing systems. You'll shape... ...or optimizing kernels for ML or other GPU-heavy workloads Fluency in Python and C/C++, and...
Performance
Work at office
Flexible hours
Asari AI
San Francisco, CA
1 day ago
Member of Technical Staff, Kernels
$350k
...for an engineer to design, implement, and optimize custom ML kernels that bolster our model development stack. Your work will be deep... ...system, combining hardware and software insights to optimize performance. Some example areas you might work on (not limited to): Design...
Performance
Mirendil
San Francisco, CA
2 days ago
Member of Technical Staff, AI Supercomputing
About the Role As a Member of Technical Staff, AI Supercomputing at Radical Numerics... ..., build, and operate the GPU supercomputing environment... .... You will deliver high-performance, reliable, and cost-efficient... ...memory efficiency, custom kernels, compilation paths, and systems...
Performance
Local area
Radical Numerics Inc.
San Francisco, CA
3 days ago
Member of Technical Staff, Inference
About the Role As a Member of Technical Staff, Inference at Radical Numerics, you will build and... ...state-of-the-art inference performance for large-scale genome and multimodal... ...in large language model inference, kernel optimization, GPU systems, and performance engineering...
Performance
Local area
Radical Numerics Inc.
San Francisco, CA
3 days ago
Member of Technical Staff - Sandbox Platform
$150k - $300k
...distributed system with performance engineering at its... ...skills, from deep Linux kernel topics to high-level distributed... ...at scale. Core Technical Responsibilities... ...heterogeneous hardware (CPU, GPU, TPU) Platform... ...development and encourage team members to contribute to the...
Performance
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime Intellect
San Francisco, CA
4 days ago
Member of Technical Staff - Inference
$150k - $300k
...training stack. Core Technical Responsibilities LLM Serving... ...across our cloud GPU fleets. GPU‑Aware... ...Inference Optimization & Performance Framework Development... ...Performance: Profile kernels, memory bandwidth and... ...development and encourage team members to contribute to the...
Performance
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime Intellect
San Francisco, CA
2 days ago
Member of Technical Staff, Model Efficiency
Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence... ...inference stack to improve core performance metrics by diving deep into model... ...performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model...
Performance
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
2 days ago
Member of Technical Staff - Inference
...processing down to the lowest layers of the stack. You'll optimize kernel performance, develop new scheduling and parallelism strategies, and help... ...like vLLM and SGLang. Understand every microsecond of GPU time spent during a forward pass. You'll be able to explain every...
Performance
Sail Research
San Francisco, CA
2 days ago
Member of Technical Staff - Efficient ML
...efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels, fused ops. Flash-attention-style speedups, sequence packing, KV-cache tricks....
Performance
Embedding VC
San Francisco, CA
1 day ago
Member of Technical Staff
...optimizes AI itself. Our journey starts with GPU kernels, but will expand into every corner of... ...systems that help the agent diagnose performance bottlenecks Ship features that... ...You're a strong fit if you: Have deep technical intuition and can learn new domains quickly...
Performance
Remote work
Wafer
San Francisco, CA
4 days ago
Member of Technical Staff - ML Systems & Inference
...component to hardware that best fits its performance and efficiency needs. This approach... .... Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference.... ...boundaries Work closely with compilers, kernels, networking, and distributed systems...
Performance
Gimlet Labs
San Francisco, CA
5 days ago
Compute Infrastructure Member of Technical Staff
...profiling, debugging, and optimizing complex system-level performance issues, with deep knowledge of Linux kernel internals, resource management, scheduling, memory... ...large‑scale AI training/inference clusters (GPU/TPU scale) (Desirable) Experience with custom runtimes...
Performance
xAI
San Francisco, CA
7 days ago
Member of Technical Staff - Compilers
...component to hardware that best fits its performance and efficiency needs. This approach... .... Gimlet Labs is seeking a Member of Technical Staff focused on compilers. In this role, you... ...spanning graph-level, tensor-level, and kernel-level representations Implement partitioning...
Performance
Gimlet Labs
San Francisco, CA
5 days ago
Member of the Technical Staff- LLMs
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation:... ...strong engineer to join a lean, high-performance team building next-generation inference... ...directly on problems like: Scaling multi-GPU inference workloads Designing...
Performance
Full time
Temporary work
Immediate start
Visa sponsorship
Work visa
Amadeus Search
San Francisco, CA
3 days ago
Member of Technical Staff, Pre-training Systems
$225k
...large-scale model training across massive GPU clusters. You will work at the boundary... ...systems, ensuring that training runs are performant, reliable, and reproducible under extreme... ...and training throughput Collaborate with Kernels and Research to align model architecture...
Performance
Relocation
Visa sponsorship
Magic
San Francisco, CA
3 days ago
Member of Technical Staff - Infrastructure
..., enabling step-function improvements in performance and efficiency. Customers deploy through... ...Design, deploy, and operate large‑scale CPU, GPU, and accelerator clusters powering... ...performance, networking, storage, processes, and kernel‑level issues. Experience operating...
Performance
Gimlet Labs
San Francisco, CA
1 day ago
Member of Technical Staff (Evals & Post-Training Product)
...the hardware side of AI—understanding GPU constraints, inference optimization techniques... ..., and how they relate to model performance (Desirable) Startup DNA: Experience... ...What the job involves We are seeking a Member of Technical Staff, Evals & Post‑Training Product to help...
Performance
Fireworks AI
San Francisco, CA
2 days ago
Member of Technical Staff, Compilers
...changes the company. As an early member of the engineering team, you... ...the systems, standards, and technical culture behind a new class of... ...of this challenge. The performance gains unlocked at this layer... ...scheduling, memory movement, kernel orchestration, speculative decoding...
Performance
Acceler8 Talent
San Francisco, CA
1 day ago
Member of Technical Staff, Inference
Member of Technical Staff — ML Systems & Inference Employment Type: Full-time Workplace: On-site About... ..., memory management, and systems performance. Inference sits at the center of this... ...in production Partner with compiler, kernel, networking, and distributed systems...
Performance
Full time
Acceler8 Talent
San Francisco, CA
4 days ago
Member of Technical Staff, Infrastructure
...with a research and product focus. As a Member of Technical Staff on our infrastructure team, you'll own... ...global low-latency, high-throughput GPU ML inference infra that sits in the... ...Terraform, Docker and CI/CD, and building for performance and reliability at scale Have owned...
Performance
Visa sponsorship
The Token Company
San Francisco, CA
1 day ago
Member of Technical Staff - RL Infrastructure
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing... ...and/or RL training. Experience with GPU clusters, distributed training, model... ...observability, testing, debugging, and performance optimization. Ability to work closely...
Performance
Work at office
Local area
Vmax
San Francisco, CA
3 days ago
Member of Technical Staff - Foundations
$200k
...checkpoint management Write custom CUDA/Triton kernels to optimize critical training operations... ...pipeline parallelism Experience writing performant CUDA or Triton kernels for ML workloads... ...scheduling, networking bottlenecks, and GPU/TPU performance optimization Preferred...
Performance
Work at office
Visa sponsorship
Tzafon
San Francisco, CA
1 day ago
Member of Technical Staff - Mid-Training Infra
...Design, build, and operate large-scale GPU infrastructure for high-throughput model... ...learning pipelines at scale. Build high-performance inference platforms capable of serving and... ...Improve performance of model execution through kernel-level optimization, model parallelism...
Performance
Relocation package
Reflection
San Francisco, CA
5 days ago
Member of Technical Staff, Infrastructure and Training Systems
...responsibility to defend. About the Role As a Member of Technical Staff, Infrastructure & Training Systems at... ...will work on distributed training, performance optimization, reusable internal... ...the stack affects research velocity: kernel performance, communication overhead,...
Performance
Local area
Radical Numerics Inc.
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Kernels & GPU Performance. Be the first to apply!