Member of Technical Staff - Kernels & GPU Performance
Gimlet Labs
Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in power, capacity, and cost with today’s homogeneous, vertically integrated infrastructure. Gimlet addresses this by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across multi-vendor and multi-generation hardware, including the latest emerging accelerators. These systems unlock step-function improvements in performance and cost efficiency at scale. On top of this foundation, Gimlet is building a production-grade neocloud for agentic workloads. Customers use Gimlet to deploy and manage their workloads through stable, production-ready APIs, without having to reason about hardware selection, placement, or low-level performance optimization. Gimlet works with foundation labs, hyperscalers, and AI native companies to power real production workloads built to scale to gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behavior, design and optimize kernels, and ensure performance is reliable across both established and emerging hardware. This role is ideal for engineers who enjoy deep performance work, reasoning about hardware tradeoffs, and turning theoretical peak performance into real-world results. Responsibilities Design, implement, and optimize GPU and accelerator kernels for AI workloads Analyze and tune performance across the GPU execution stack, including memory access patterns, synchronization, and instruction scheduling Work with compilers and runtimes to ensure kernels integrate cleanly and perform well in end-to-end systems Bring up and optimize execution on new or emerging accelerators Profile, benchmark, and debug performance issues across kernels, runtimes, and hardware Ensure performance optimizations are robust, correct, and production-ready at scale Qualifications Strong software engineering fundamentals Experience working on performance-critical systems close to hardware Comfort reasoning about low-level execution behavior, memory hierarchies, and performance tradeoffs Preferred Qualifications Experience with CUDA, Triton, CUTLASS, or other accelerator programming models Deep understanding of GPU execution models (warps/wavefronts, blocks, grids) Experience optimizing memory access patterns (coalescing, shared memory, cache behavior) Familiarity with occupancy, latency hiding, and instruction-level parallelism Experience using profiling and performance analysis tools Familiarity with multi-GPU or distributed execution is a plus #J-18808-Ljbffr Gimlet Labs
$225k
...compute to achieve this goal. About The Role As a Kernel Engineer, you will design, implement, and maintain high-performance kernels to optimize throughput and latency... ...Blackwell or Google TPUs Develop and optimize GPU kernels in frameworks such as NCCL, MSCCLPP, CUTLASS...PerformanceRelocationVisa sponsorship- ...reliably than humans can alone. Our technical approach combines frontier-... .... About the Role As a Kernel Engineer, you will design, implement, and maintain high-performance kernels that optimize throughput... ...Experience developing and optimizing GPU or accelerator kernels using...PerformanceWork at officeVisa sponsorshipRelocation packageFlexible hours
$150k - $250k
...challenges and the wins. What you'll do Bring deep kernel expertise to our AI agents that optimize high-performance, mission-critical computing systems. You'll shape... ...or optimizing kernels for ML or other GPU-heavy workloads Fluency in Python and C/C++, and...PerformanceWork at officeFlexible hours$180k
...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create... ...Role We are building the high-performance inference platform that serves Grok... ...scaling) to deep low-level optimizations (GPU kernels, quantization, speculative decoding,...PerformanceTemporary work- ...Member Of Technical Staff - Image / Video Generation Freiburg (Germany) About... ...models don't fit on one GPU and training decisions impact... ...and backward Triton kernels and ensuring their correctness... ...trace viewers Know the performance characteristics of different...PerformanceRemote workWorldwide2 days per week
- Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting... ...utilization and better performance across multi-vendor systems... ...systems Partner with compiler, kernel, networking, and... ...Serving, Distributed Systems, GPU Infrastructure, AI Infrastructure...Performance
$150k - $300k
...training stack. Core Technical Responsibilities LLM Serving... ...across our cloud GPU fleets. GPU‑Aware... ...Inference Optimization & Performance Framework Development... ...Performance: Profile kernels, memory bandwidth and... ...development and encourage team members to contribute to the...PerformanceWork at officeRemote workVisa sponsorshipRelocation packageFlexible hoursShift work- Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale intelligence... ...inference stack to improve core performance metrics by diving deep into model... ...performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model...PerformanceFull timeWork at officeRemote workFlexible hours
$225k
...large-scale model training across massive GPU clusters. You will work at the boundary... ...systems, ensuring that training runs are performant, reliable, and reproducible under extreme... ...training throughput Collaborate with Kernels and Research to align model architecture...PerformanceRelocationVisa sponsorship- ..., enabling step-function improvements in performance and efficiency. Customers deploy through... ...Design, deploy, and operate large-scale CPU, GPU, and accelerator clusters powering... ...performance, networking, storage, processes, and kernel-level issues. Experience operating...Performance
$200k - $350k
...success for both clients and candidates. Member of Technical Staff - Pre-Training Infrastructure Location:... .... Build efficient and reproducible multi-GPU and multi-node training workflows. Develop high-performance data pipelines capable of handling...PerformanceWork at officeVisa sponsorship$256k - $276k
...Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and... ...and research teams to ensure performance, scalability, and reliability of critical... ...services Optimize performance for GPU/xPU accelerators and cloud environments...PerformanceWork at officeFlexible hours3 days per week- ...efficiency Dataloaders, fusion, activation remat, gradient checkpointing. FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight profiling, Triton/CUDA kernels, fused ops. Flash-attention-style speedups, sequence packing, KV-cache tricks....Performance
$200k - $350k
...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient... ...Build and optimize high-performance model serving systems for low-latency... ...Familiarity with high-performance computing and GPU programming (CUDA). Experience with...PerformanceImmediate startFlexible hours- Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step... ...models Designing and implementing custom ML kernels Optimizing performance (latency, throughput, cost) Developing data pipelines...Performance
$150k - $300k
...fine-tuning runs on managed GPU clusters with a single API call... ...runs the jobs. Core Technical Responsibilities Hosted... ...fundamentals: networking, namespaces, performance tuning Programming &... ...and encourage team members to contribute to the broader...PerformanceWork at officeLocal areaRemote workVisa sponsorshipRelocation packageFlexible hours- ...profiling, debugging, and optimizing complex system-level performance issues, with deep knowledge of Linux kernel internals, resource management, scheduling, memory... ...large‑scale AI training/inference clusters (GPU/TPU scale) (Desirable) Experience with custom runtimes...Performance
- ...component to hardware that best fits its performance and efficiency needs. This approach... .... Gimlet Labs is seeking a Member of Technical Staff focused on compilers. In this role, you... ...spanning graph-level, tensor-level, and kernel-level representations Implement partitioning...Performance
- ...component to hardware that best fits its performance and efficiency needs. This approach... .... Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference.... ...boundaries. Work closely with compilers, kernels, networking, and distributed systems...Performance
- ...optimizes AI itself. Our journey starts with GPU kernels, but will expand into every corner of... ...systems that help the agent diagnose performance bottlenecks Ship features that... ...You're a strong fit if you: Have deep technical intuition and can learn new domains quickly...PerformanceRemote work
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation:... ...strong engineer to join a lean, high-performance team building next-generation inference... ...directly on problems like: Scaling multi-GPU inference workloads Designing...PerformanceFull timeTemporary workImmediate startVisa sponsorshipWork visa$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing... ...and/or RL training. Experience with GPU clusters, distributed training, model... ...observability, testing, debugging, and performance optimization. Ability to work closely...PerformanceWork at officeLocal area- Member of Technical Staff - Agents at Prime Intellect - San Francisco Building the Future of Open Source... ...handle evolving feature requests and performance needs. Experimental Features : Quickly... ...infrastructure. Nice to Have GPU/ML Infrastructure : Understanding how...PerformanceRemote workFlexible hours
- ...possible. You will work directly with the technical lead on problems that require deep... ...and directly impacts model performance on real devices. While San Francisco... ...Work Implement and optimize inference kernels for CPU, NPU, and GPU architectures across diverse edge hardware...Performance
- ...heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade... ...of AI. About the role Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As an AI Researcher, you...Performance
- ...heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade... ...of AI. About the role Gimlet Labs is seeking a Member of Technical Staff (Intern) to help develop Gimlet's platform for deploying...PerformanceInternship
- ...API Gateway. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200... ...ingress through continuous batching and GPU kernel interleaving. Build dashboards... ...Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar)....Performance
- ...techniques and numerical precision trade-offs across different model scales Analyze, profile and debug low-level GPU operations to optimize performance Stay up-to-date on research to bring new ideas to work What We're Looking For Strong grasp of state-...Performance
$180k
...the biotech industry. Latch is building intelligent, high performance agents for biological data analysis, empowering over 5,000... ...handle data from instrument-to-insights. We're seeking a Member of Technical Staff for Therapeutics to lead our therapeutics bench, pushing its...PerformanceFull timeWork at office$200k
...internal platform that teams across Magic use to evaluate the performance of internal and external models. The team supports pre-... ...of many of the company's most important decisions. As a Member of Technical Staff on Evals, you will build both the platform and the evaluations...PerformanceVisa sponsorshipRelocation package
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff - Kernels & GPU Performance. Be the first to apply!
- remote support technician San Francisco, CA
- personal computer support technician San Francisco, CA
- customer support analyst San Francisco, CA
- systems support technician San Francisco, CA
- help desk administrator San Francisco, CA
- decision support analyst San Francisco, CA
- technical support assistant San Francisco, CA
- technical analyst San Francisco, CA
- technical assistant San Francisco, CA
- IT support technician San Francisco, CA


