Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

RLEE - Low-Level Engineering & Kernel Inference Optimization

$90 - $125 per hour

Open Data Science

RLEE - Low-Level Engineering & Kernel Inference Optimization RL Environments Kernel Optimization GPU/CUDA Compilers (LLVM/MLIR) PyTorch Extensions Distributed Inference (vLLM/NCCL) Brief Description of the Role We're hiring Low-Level Engineers to design and build RL environments that teach LLMs kernel development, hardware optimization, and systems programming. The goal is to create realistic feedback loops where models learn to write high-performance code across GPU and CPU architectures. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required. About the Company Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops. Our founding team has previous experience on Anthropic's data team building data infrastructure, tokenizers, and datasets behind the Claude model. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. The company is backed by Tier 1 Silicon Valley VC. Responsibilities Design and build MLE/SWE environments and diverse tasks. Target a specified language model and satisfy the required difficulty distribution. Requirements Minimal Qualifications Strong Python (engineering-quality, not notebook-only) Clear understanding of LLMs, their current limitations Ability to meet throughput expectations and respond quickly to feedback You may be a good fit if one of the following applies Deep understanding of memory hierarchies (registers, L1/L2/shared memory, HBM, system RAM) and their performance implications Threading models, synchronization primitives, and concurrent programming (warps, thread blocks, barriers, atomics) Cache coherence, memory access patterns, coalescing, and bank conflicts AOT compilation and optimization passes (LLVM, MLIR, TVM) Compiler and kernel frameworks such as CUTLASS, BitBLAS, or JAX/Pallas Modern C++, including templates, concurrency, and build systems Assembly-level programming and low-level optimization across GPU and CPU architectures (e.g., x86, ARM, NVIDIA Hopper, NVIDIA Blackwell) Debugging and optimizing GPU kernels using CUDA and/or HIP/ROCm Developing PyTorch custom operators, backend extensions, or dispatcher integrations (e.g., ATen, TorchScript, or custom backends) Customizing, extending, or optimizing vLLM, including distributed inference workflows GPU communication libraries and collectives, such as NVIDIA NCCL, AMD RCCL, MPI, or UCX Mixed-precision and low-precision kernels (e.g., FP16, BF16, FP8, INT8), including numerical stability and performance trade-offs Working conditions Hourly contractor rate: 90- 125 USD/hour (dependent on the expertise level and quality of take-home assignment). Contacts Log In Only registered users can open employer contacts. #J-18808-Ljbffr Open Data Science

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the RLEE - Low-Level Engineering & Kernel Inference Optimization in San Francisco, CA vacancy
  • $90 - $125 per hour

    A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming. Candidates should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly... 
    Suggested
    Remote job
    Hourly pay
    For contractors

    Open Data Science

    San Francisco, CA
    5 days ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About...  ...This role will focus on low-latency, high-throughput...  ...graph, compiled, efficient kernels Soft Skills:...  ...determined by location, level and role. Individual compensation... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    5 days ago
  •  ...experienced individual to develop low-latency inference pipelines for on-device...  ...role involves designing and optimizing distributed systems on GPU...  ...implementing efficient low-level code such as CUDA and Triton...  ...background, and mastery in kernel optimization. This position... 
    Suggested

    Genesis AI

    San Francisco, CA
    3 days ago
  • Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming... 
    Suggested
    Flexible hours

    Liquid AI

    San Francisco, CA
    5 days ago
  • $280k

     ...committed researchers, engineers, policy experts, and...  ...the Role As a TPU Kernel Engineer, you'll be responsible...  ..., training, and inference. A significant portion...  ...involve designing and optimizing kernels for the TPU....  ...systems problems and low-level optimization. You may... 
    Suggested
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing...  ...role, you will design and optimize custom GPU kernels that...  ...hardware–software stack, from low-level kernel development to integrating...  ...large-scale training and inference. This role is ideal for someone... 
    Flexible hours

    Sciforium

    San Francisco, CA
    9 hours ago
  •  ...powers mission‑critical inference for the world's most...  ...help build the platform engineers turn to to ship AI...  ...ROLE We’re seeking a GPU Kernel Engineer to join our...  ...modern AI workloads, optimizing every microsecond of computation...  ...passionate about low‑level optimization and high‑... 
    Flexible hours

    Baseten

    San Francisco, CA
    5 days ago
  • $280k

    About The Role As a TPU Kernel Engineer, you'll be responsible for identifying...  ...research, training, and inference. A significant portion of...  ...will involve designing and optimizing kernels for the TPU. You...  ...scale systems problems and low-level optimization. You May Be a... 
    Visa sponsorship

    Anthropic

    San Francisco, CA
    1 day ago
  •  ...ABOUT THE ROLE You'll write and optimize the GPU kernels and supporting systems software that makes our training and inference workloads fast. This is deep, low-level work (performance counters,...  ...actually use. We hire kernel engineers because the gap between "this works... 
    Shift work

    MakerMaker

    San Francisco, CA
    1 day ago
  • $225k

    Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA... 

    Magic

    San Francisco, CA
    3 days ago
  • FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 

    FriendliAI

    San Francisco, CA
    3 days ago
  • MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern... 

    MakerMaker

    San Francisco, CA
    1 day ago
  • $100k - $120k

     ...foundation models. As training and inference workloads grow, we need kernel‑level innovations to reduce latency,...  ...s founding team to architect and optimize low‑level compute kernels, drivers, and...  ...Lead a team of kernel and system engineers focused on performance-critical... 

    Coda Robotics

    San Francisco, CA
    1 day ago
  • $167.2k - $209k

     ...DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. DigitalOcean aims to be...  ...the inference engine and GPU kernel layers, ensuring our...  ...skills, particularly related to low-level GPU programming - optimization... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    1 day ago
  • $280k

    Anthropic is looking for a TPU Kernel Engineer in San Francisco, California. In this role, you will identify and resolve performance...  ...ML systems, particularly in research, training, and inference. You will design and optimize TPU kernels and provide critical feedback to... 

    Anthropic

    San Francisco, CA
    1 day ago
  •  ...hardware, ensuring low latency, minimal...  ...Opportunity Our Edge Inference team compiles...  ...Foundation Models into optimized machine code that...  ...at the hardware level: You understand cache...  ...inference kernels for CPU, NPU, and...  ...Embedded software engineering experience or work... 

    Liquid AI

    San Francisco, CA
    5 days ago
  •  ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems...  ...on the inference systems behind low-latency conversational speech models....  ...stability GPU profiling and identifying kernel-level bottlenecks Optimising TensorRT,... 
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    3 days ago
  • $160k - $230k

    Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate... 

    Together AI

    San Francisco, CA
    2 days ago
  •  ...software developers of all skill levels. Were commercializing Ray, a...  ...About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of...  ...providing optimizations achieving low-cost solutions for large scale ML... 
    Work at office

    Anyscale

    San Francisco, CA
    3 days ago
  •  ...We are searching for a CUDA Kernel Engineer who has hands‑on experience developing and optimizing NVIDIA CUDA kernels from scratch...  ...architecture, memory hierarchy, warp‑level execution, and profiling...  .... Knowledge of model inference optimization (TensorRT, CUDA Graphs... 
    Remote job
    Local area
    Immediate start
    Relocation package

    Pragmatike

    San Francisco, CA
    1 day ago
  • $285k - $315k

     ...looking for a Founding GPU Kernel Engineer who lives right at the boundary...  ...knowledge into compiler optimization passes that help every model...  ...Profile at the microarchitectural level: look into SM utilization,...  ...) Strong skills with low-level profiling tools: Nsight... 
    Full time
    Work at office
    Relocation package

    SF Tensor

    San Francisco, CA
    3 days ago
  •  ...Chopping Block, Inc. is looking for a specialized role to model inference performance across application, model, and fleet layers....  ...performance models and analyzing inference workloads to identify and optimize bottlenecks. Ideal candidates will have deep expertise in... 

    AI Chopping Block, Inc.

    San Francisco, CA
    3 days ago
  •  ...Francisco seeks a Staff Software Engineer to lead kernel-level performance engineering...  ...involves designing and optimizing high-performance GPU...  ...performance roadmaps for low-level compute paths. Ideal...  ...on pushing the frontier of inference performance. #J-18808-Ljbffr... 

    Databricks

    San Francisco, CA
    2 days ago
  •  ...ML Systems Engineer — Training & Inference Optimization (MBMB) We are building large-scale embodied intelligence...  ...ML, hardware-aware, and software-level optimizations that materially...  ...performance Work across: CUDA kernels and low-level GPU execution ML model... 

    Seer

    San Francisco, CA
    3 days ago
  • $300k

     ...leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The...  ..., a deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base... 
    Visa sponsorship
    Relocation package

    Trades Workforce Solutions

    San Francisco, CA
    1 day ago
  • Acceler8 Talent is looking for a Kernel Engineer in San Francisco, California. The role involves designing and optimizing high-performance kernels to enhance throughput and...  ...-scale AI systems. Candidates should have low-level programming experience with AI hardware accelerators... 
    Flexible hours

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  • A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal... 

    Baseten

    San Francisco, CA
    3 days ago
  •  ...unicorn founders and senior engineers with deep expertise in...  ...Founding Engineer, ML Inference with deep expertise in...  ...inference frameworks, optimizing inference performance,...  ...edge in ultra-low-latency, high-throughput...  ....compile, custom CUDA kernels, and specialized inference... 
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    4 days ago
  •  ...research company in San Francisco is seeking a Systems Engineer focused on kernel optimization and AI-assisted workflows. You'll develop tooling to improve...  ...in performance optimization, particularly in low-level software. Join us in shaping the future of AI development... 

    OpenAI

    San Francisco, CA
    2 days ago
  • $300k

    GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits...  ...? This team is building low-latency AI systems where...  ...understands GPUs at an architectural level. Someone who knows where...  ...lost: memory hierarchy, kernel launch overhead, occupancy... 
    Relocation
    Visa sponsorship
    Free visa

    Techire Ai

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to RLEE - Low-Level Engineering & Kernel Inference Optimization. Be the first to apply!