Member of Technical Staff - Kernels & GPU Performance

Gimlet Labs

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.
About the role

Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behavior, design and optimize kernels, and ensure performance is reliable across both established and emerging hardware.

This role is ideal for engineers who enjoy deep performance work, reasoning about hardware tradeoffs, and turning theoretical peak performance into real-world results.
What you will work on

Design, implement, and optimize GPU and accelerator kernels for AI workloads
Analyze and tune performance across the GPU execution stack, including memory access patterns, synchronization, and instruction scheduling
Work with compilers and runtimes to ensure kernels integrate cleanly and perform well in end-to-end systems
Bring up and optimize execution on new or emerging accelerators
Profile, benchmark, and debug performance issues across kernels, runtimes, and hardware
Ensure performance optimizations are robust, correct, and production-ready at scale

You may be a good fit if

Strong software engineering fundamentals
Experience working on performance-critical systems close to hardware
Comfort reasoning about low-level execution behavior, memory hierarchies, and performance tradeoffs

Strong candidates may also have

Experience with CUDA, Triton, CUTLASS, or other accelerator programming models
Deep understanding of GPU execution models (warps/wavefronts, blocks, grids)
Experience optimizing memory access patterns (coalescing, shared memory, cache behavior)
Familiarity with occupancy, latency hiding, and instruction-level parallelism
Experience using profiling and performance analysis tools
Familiarity with multi-GPU or distributed execution is a plus

What Makes Gimlet Different

At Gimlet, you will work on infrastructure problems that span the full stack of modern AI systems. Our team operates across datacenters, networking, distributed systems, compilers, runtimes, orchestration, and performance engineering to build the foundation for the next generation of AI infrastructure.

As an early member of the team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.

We value people who are excited to work across domains, take ownership of meaningful problems, and build technology that enables the next generation of AI.

Apply

Vacancy posted 23 hours ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff - Kernels & GPU Performance in San Francisco, CA vacancy

Member of Technical Staff, Kernels
$225k
...compute to achieve this goal. About The Role As a Kernel Engineer, you will design, implement, and maintain high-performance kernels to optimize throughput and latency... ...Blackwell or Google TPUs Develop and optimize GPU kernels in frameworks such as NCCL, MSCCLPP, CUTLASS...
Performance
Relocation
Visa sponsorship
Magic
San Francisco, CA
3 days ago
Member of Technical Staff - Kernels
...reliably than humans can alone. Our technical approach combines frontier-... .... About the Role As a Kernel Engineer, you will design, implement, and maintain high-performance kernels that optimize throughput... ...Experience developing and optimizing GPU or accelerator kernels using...
Performance
Work at office
Visa sponsorship
Relocation package
Flexible hours
Acceler8 Talent
San Francisco, CA
4 days ago
Member of Technical Staff - Kernel Engineer
...challenges and the wins. What You'll Do Bring deep kernel expertise to our AI agents that optimize high-performance, mission-critical computing systems. You'll shape... ...or optimizing kernels for ML or other GPU-heavy workloads Fluency in Python and C/C++, and...
Performance
Work at office
Flexible hours
Asari AI
San Francisco, CA
4 days ago
Member of Technical Staff - Image / Video Generation
...Member Of Technical Staff - Image / Video Generation Freiburg (Germany) About... ...models don't fit on one GPU and training decisions impact... ...and backward Triton kernels and ensuring their correctness... ...trace viewers Know the performance characteristics of different...
Performance
Remote work
Worldwide
2 days per week
Black Forest Labs
San Francisco, CA
3 days ago
Member of Technical Staff, ML Infrastructure & Inference
Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting... ...utilization and better performance across multi-vendor systems... ...systems Partner with compiler, kernel, networking, and... ...Serving, Distributed Systems, GPU Infrastructure, AI Infrastructure...
Performance
Acceler8 Talent
San Francisco, CA
1 day ago
Member of Technical Staff, Model Efficiency
...Member of Technical Staff, Model EfficiencyWho are we?Our mission is to scale intelligence... ...inference stack to improve core performance metrics by diving deep into model... ...performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution...
Performance
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
2 days ago
Senior Member of Technical Staff, Multimodal AI
...With a focused team, breakthrough performance doesn't require breakthrough compute... ...that matter, and join the team. As a Member of Technical Staff with a focus on Multimodal AI, you will... ...: Experience in writing efficient GPU kernels using CUDA, optimising performance...
Performance
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
1 day ago
Member of Technical Staff - Inference
$150k - $300k
...training stack. Core Technical Responsibilities LLM Serving... ...across our cloud GPU fleets. GPU‑Aware... ...Inference Optimization & Performance Framework Development... ...Performance: Profile kernels, memory bandwidth and... ...development and encourage team members to contribute to the...
Performance
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime Intellect
San Francisco, CA
2 days ago
Member of Technical Staff - Sandbox Platform
$150k - $300k
...distributed system with performance engineering at its... ...skills, from deep Linux kernel topics to high-level distributed... ...at scale. Core Technical Responsibilities... ...heterogeneous hardware (CPU, GPU, TPU) Platform... ...development and encourage team members to contribute to the...
Performance
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime Intellect, Inc.
San Francisco, CA
3 days ago
Member of Technical Staff - ML Systems & Inference
...enabling step-function improvements in performance and efficiency. Customers deploy... ...the role Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference.... ...boundaries Work closely with compilers, kernels, networking, and distributed systems...
Performance
Gimlet Labs
San Francisco, CA
23 hours ago
Member of Technical Staff, Pre-training Systems
$225k
...large-scale model training across massive GPU clusters. You will work at the boundary... ...systems, ensuring that training runs are performant, reliable, and reproducible under extreme... ...training throughput Collaborate with Kernels and Research to align model architecture...
Performance
Relocation
Visa sponsorship
Magic Inc
San Francisco, CA
4 days ago
Member of Technical Staff, AI Platform & Architecture (Infrastructure)
$256k - $276k
...Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and... ...and research teams to ensure performance, scalability, and reliability of critical... ...services Optimize performance for GPU/xPU accelerators and cloud environments...
Performance
Work at office
Flexible hours
3 days per week
Postman
San Francisco, CA
23 hours ago
Member of Technical Staff - Training Platform
$150k - $300k
...fine-tuning runs on managed GPU clusters with a single API call... ...runs the jobs. Core Technical Responsibilities Hosted... ...fundamentals: networking, namespaces, performance tuning Programming &... ...and encourage team members to contribute to the broader...
Performance
Work at office
Local area
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime Intellect
San Francisco, CA
4 days ago
Member of Technical Staff - Efficient ML
...efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels, fused ops. Flash-attention-style speedups, sequence packing, KV-cache tricks....
Performance
Embedding VC
San Francisco, CA
1 day ago
Member of Technical Staff, Inference (Bay Area, Remote)
...optimize distributed inference systems on GPU clusters, pushing throughput with large-... ...efficient low-level code (CUDA, Triton, custom kernels) and integrate it seamlessly into high-... ...systems, ML infrastructure, or high-performance serving (8+ years) Production-grade expertise...
Performance
Remote job
Genesis AI
San Francisco, CA
2 days ago
Member of Technical Staff, ML Systems
Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step... ...models Designing and implementing custom ML kernels Optimizing performance (latency, throughput, cost) Developing data pipelines...
Performance
Mirendil
San Francisco, CA
4 days ago
Member of Technical Staff, Training (Bay Area, Remote)
...the foundation model training stack, from data pipelines to GPU kernels Design, build, and optimize distributed training systems (PyTorch... ...tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures What You’ll Bring Deep experience...
Performance
Remote job
Genesis AI
San Francisco, CA
2 days ago
Member of Technical Staff
...optimizes AI itself. Our journey starts with GPU kernels, but will expand into every corner of... ...systems that help the agent diagnose performance bottlenecks Ship features that... ...You're a strong fit if you: Have deep technical intuition and can learn new domains quickly...
Performance
Remote work
Wafer
San Francisco, CA
4 days ago
Member of Technical Staff - Compilers
...component to hardware that best fits its performance and efficiency needs. This approach... .... Gimlet Labs is seeking a Member of Technical Staff focused on compilers. In this role, you... ...spanning graph-level, tensor-level, and kernel-level representations Implement partitioning...
Performance
Gimlet Labs
San Francisco, CA
23 hours ago
Member of Technical Staff, Inference & RL Systems
$225k
...reliable. What you’ll work on Design and scale high‑performance inference serving systems Optimize KV‑cache... ...Profile and eliminate performance bottlenecks across GPU, networking, and storage layers Collaborate with Kernels and Research to align execution systems with...
Performance
Relocation
Visa sponsorship
Magic
San Francisco, CA
3 days ago
Member of the Technical Staff- LLMs
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation:... ...strong engineer to join a lean, high-performance team building next-generation inference... ...directly on problems like: Scaling multi-GPU inference workloads Designing...
Performance
Full time
Temporary work
Immediate start
Visa sponsorship
Work visa
Amadeus Search
San Francisco, CA
3 days ago
Member of Technical Staff - RL Infrastructure
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing... ...and/or RL training. Experience with GPU clusters, distributed training, model... ...observability, testing, debugging, and performance optimization. Ability to work closely...
Performance
Work at office
Local area
Vmax
San Francisco, CA
3 days ago
Member of Technical Staff, Infrastructure and Training Systems
Member of Technical Staff, Infrastructure and Training Systems Location: SF Bay Area or Tokyo, Japan... ...You will work on distributed training, performance optimization, reusable internal... ...the stack affects research velocity: kernel performance, communication overhead,...
Performance
Full time
Radical Numerics
San Francisco, CA
23 hours ago
Member of Technical Staff - Mid-Training Infra
...Design, build, and operate large-scale GPU infrastructure for high-throughput model... ...learning pipelines at scale. Build high-performance inference platforms capable of serving and... ...Improve performance of model execution through kernel-level optimization, model parallelism...
Performance
Relocation package
Reflection
San Francisco, CA
23 hours ago
Member of Technical Staff, JAX & Compiler
$180k
...fulfill the need of our high-performance large-scale LLM... ...scale LLMs with JAX (on GPU or TPU) and applying various... ...complex use cases. Kernel Compiler Experience:... ...interview”) during which a member of our team will ask... ...which consists of four technical interviews: # Coding...
Performance
Temporary work
Relocation
xAI
San Francisco, CA
more than 2 months ago
Member of Technical Staff - Infrastructure Engineer
$180k - $300k
...Member Of Technical Staff - Infrastructure Engineer Freiburg (Germany), San Francisco (USA) About... ...optimizing components to extract peak performance from the system (both on application,... ...Python, Bash, Go Kubernetes Nvidia GPU drivers, and operators OTel,...
Performance
Work at office
Remote work
Worldwide
Relocation
2 days per week
Black Forest Labs
San Francisco, CA
4 days ago
Member of Technical Staff - Edge Inference Engineer
...possible. You will work directly with the technical lead on problems that require deep... ...and directly impacts model performance on real devices. While San Francisco... ...Work Implement and optimize inference kernels for CPU, NPU, and GPU architectures across diverse edge hardware...
Performance
Liquid AI
San Francisco, CA
4 days ago
Member of Technical Staff (AI Inference Engineer)
...scheduling and KV-cache management to support in API Gateway. GPU Kernels Migration to CuTe DSL. Port our in-house CUDA kernels to... ...Python pains and keep up with rapidly growing traffic. Performance Optimization. Profile and fix bottlenecks from network...
Performance
Perplexity AI
San Francisco, CA
4 days ago
Member of Technical Staff - ML Infra
...techniques and numerical precision trade-offs across different model scales Analyze, profile and debug low-level GPU operations to optimize performance Stay up-to-date on research to bring new ideas to work What We're Looking For Strong grasp of state-...
Performance
Causal Labs
San Francisco, CA
4 days ago
Member of Technical Staff - Distributed Systems
...heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade... ...of AI. About the role Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will...
Performance
Gimlet Labs
San Francisco, CA
23 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Kernels & GPU Performance. Be the first to apply!