Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

RLEE - Low-Level Engineering & Kernel Inference Optimization

$90 - $125 per hour

Open Data Science

RLEE - Low-Level Engineering & Kernel Inference Optimization RL Environments Kernel Optimization GPU/CUDA Compilers (LLVM/MLIR) PyTorch Extensions Distributed Inference (vLLM/NCCL) Brief Description of the Role We're hiring Low-Level Engineers to design and build RL environments that teach LLMs kernel development, hardware optimization, and systems programming. The goal is to create realistic feedback loops where models learn to write high-performance code across GPU and CPU architectures. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required. About the Company Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops. Our founding team has previous experience on Anthropic's data team building data infrastructure, tokenizers, and datasets behind the Claude model. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. The company is backed by Tier 1 Silicon Valley VC. Responsibilities Design and build MLE/SWE environments and diverse tasks. Target a specified language model and satisfy the required difficulty distribution. Requirements Minimal Qualifications Strong Python (engineering-quality, not notebook-only) Clear understanding of LLMs, their current limitations Ability to meet throughput expectations and respond quickly to feedback You may be a good fit if one of the following applies Deep understanding of memory hierarchies (registers, L1/L2/shared memory, HBM, system RAM) and their performance implications Threading models, synchronization primitives, and concurrent programming (warps, thread blocks, barriers, atomics) Cache coherence, memory access patterns, coalescing, and bank conflicts AOT compilation and optimization passes (LLVM, MLIR, TVM) Compiler and kernel frameworks such as CUTLASS, BitBLAS, or JAX/Pallas Modern C++, including templates, concurrency, and build systems Assembly-level programming and low-level optimization across GPU and CPU architectures (e.g., x86, ARM, NVIDIA Hopper, NVIDIA Blackwell) Debugging and optimizing GPU kernels using CUDA and/or HIP/ROCm Developing PyTorch custom operators, backend extensions, or dispatcher integrations (e.g., ATen, TorchScript, or custom backends) Customizing, extending, or optimizing vLLM, including distributed inference workflows GPU communication libraries and collectives, such as NVIDIA NCCL, AMD RCCL, MPI, or UCX Mixed-precision and low-precision kernels (e.g., FP16, BF16, FP8, INT8), including numerical stability and performance trade-offs Working conditions Hourly contractor rate: 90- 125 USD/hour (dependent on the expertise level and quality of take-home assignment). Contacts Log In Only registered users can open employer contacts. #J-18808-Ljbffr Open Data Science

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the RLEE - Low-Level Engineering & Kernel Inference Optimization in San Francisco, CA vacancy
  • $90 - $125 per hour

    A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming. Candidates should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly... 
    Suggested
    Remote job
    Hourly pay
    For contractors

    Open Data Science

    San Francisco, CA
    1 day ago
  • $160k - $230k

     ...efficient and scalable inference for large language...  ...Our mission is to optimize inference...  ...and Optimization Engineer to design, develop...  ...This role focuses on low‑latency, high‑throughput...  ...graph, compiled kernels, and efficient...  ...determined by location, level and role. Equal... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    3 days ago
  • Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming... 
    Suggested
    Flexible hours

    Liquid AI

    San Francisco, CA
    1 day ago
  •  ...Model is building automated ML research engineering. Existing frontier models are brittle...  ...Machine Learning Engineers for our Low Level / Kernels Capabilities team. The Kernels team builds...  ...experience: you write kernels and optimize them iteratively against a profiler.... 
    Suggested
    Visa sponsorship
    Relocation package

    Preference Model

    San Francisco, CA
    3 days ago
  • Gimlet Labs, Inc. is seeking a Senior Technical Recruiter to focus on recruiting for specialized roles in kernel, compiler, and low-level systems engineering. The ideal candidate will possess over 5 years of technical recruiting experience and a proven ability to source... 
    Suggested

    Gimlet Labs, Inc.

    San Francisco, CA
    3 days ago
  •  ...hands-on support from AMD engineers the team is scaling...  ...Distributed Training and Inference Engineer to build, optimize, and maintain the critical...  ...learning infrastructure from low-level CUDA/ROCm runtimes to high...  ...runtime failures, and kernel-level inconsistencies. Collaborate... 
    Flexible hours

    Sciforium

    San Francisco, CA
    3 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing...  ...multimodal models, optimizing frameworks for low-latency and high-throughput performance. The... 

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    1 day ago
  •  ...ABOUT THE ROLE You’ll write and optimize the GPU kernels and supporting systems software that makes our training and inference workloads fast. This is deep, low-level work (performance counters,...  ...actually use. We hire kernel engineers because the gap between "this works... 
    Shift work

    MakerMaker.AI

    San Francisco, CA
    2 days ago
  • $225k

    Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA... 

    Magic Inc

    San Francisco, CA
    4 days ago
  • $167.2k - $209k

     ...DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. DigitalOcean aims to be...  ...the inference engine and GPU kernel layers, ensuring our...  ...skills, particularly related to low-level GPU programming - optimization... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    2 days ago
  • $100k - $120k

     ...foundation models. As training and inference workloads grow, we need kernel‑level innovations to reduce latency,...  ...s founding team to architect and optimize low‑level compute kernels, drivers, and...  ...Lead a team of kernel and system engineers focused on performance-critical... 

    Coda Robotics

    San Francisco, CA
    2 days ago
  • OpenAI is seeking a GPU Inference Engineer based in San Francisco, CA. In this high-impact role, you'll optimize inference performance and scalability for Robotics...  ...expertise in model performance optimization, kernel-level systems, and low-level performance tuning. The... 
    Work at office
    Relocation
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  • $315k

     ...committed researchers, engineers, policy experts, and...  ...About the Role As a TPU Kernel Engineer, you'll be...  ...research, training, and inference. A significant portion...  ...involve designing and optimizing kernels for the TPU. You...  ...systems problems and low-level optimization. You may... 
    Contract work
    For contractors
    For subcontractor
    Work at office
    Relocation
    Visa sponsorship
    Work visa
    Flexible hours

    Menlo Ventures

    San Francisco, CA
    3 days ago
  •  ...and pushing towards AGI‑level intelligence in...  ...’re looking for a GPU Inference Engineer to contribute to improvements...  ...drive initiatives to optimize inference performance...  ...optimizations from a kernel and data movement perspective...  ..., data movement, and low‑level performance... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  • Requirements Worked on system optimizations for model serving, such as batching...  ..., and parallelism , Worked on low-level optimizations for inference, such as GPU kernels and code generation , Worked on...  ...on large-scale inference engines or reinforcement learning frameworks... 

    xAI

    San Francisco, CA
    2 days ago
  •  ...hardware, ensuring low latency, minimal...  ...Opportunity Our Edge Inference team compiles...  ...Foundation Models into optimized machine code that...  ...at the hardware level: You understand cache...  ...inference kernels for CPU, NPU, and...  ...Embedded software engineering experience or work... 

    Liquid AI

    San Francisco, CA
    1 day ago
  • $285k - $315k

     ...looking for a Founding GPU Kernel Engineer who lives right at the boundary...  ...knowledge into compiler optimization passes that help every model...  ...Profile at the microarchitectural level: look into SM utilization,...  ...) Strong skills with low-level profiling tools: Nsight... 
    Full time
    Work at office
    Relocation package

    SF Tensor

    San Francisco, CA
    4 days ago
  • $200k

    Plaud is seeking skilled AI engineers to join their core SpeechLLM lab in San Francisco. You will play a crucial role in building high-throughput inference engines for conversational AI and optimizing GPU performance while collaborating with various teams. The position... 
    Work at office

    Plaud

    San Francisco, CA
    4 days ago
  •  ...Francisco seeks a Staff Software Engineer to lead kernel-level performance engineering...  ...involves designing and optimizing high-performance GPU...  ...performance roadmaps for low-level compute paths. Ideal...  ...on pushing the frontier of inference performance. #J-18808-Ljbffr... 

    Databricks

    San Francisco, CA
    3 days ago
  •  ...software developers of all skill levels. Were commercializing Ray, a...  ...About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of...  ...providing optimizations achieving low-cost solutions for large scale ML... 
    Work at office

    Anyscale

    San Francisco, CA
    4 days ago
  • $160k - $230k

    Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate... 

    Together AI

    San Francisco, CA
    3 days ago
  • $300k

     ...Description GPU Optimisation Engineer - Real-Time Inference Want to push GPU...  ...? This team is building low-latency AI systems where milliseconds...  ...GPUs at an architectural level. Someone who knows where...  ...lost: memory hierarchy, kernel launch overhead, occupancy... 
    Relocation
    Visa sponsorship
    Free visa

    Techire Ai

    San Francisco, CA
    2 days ago
  •  ...build and operate the inference systems that serve our...  ...infrastructure, runtime optimization, and the long tail of...  .... This is an engineering role, not a research role...  ...(quantization, custom kernels, scheduling improvements...  ...reading and writing systems-level code in at least one... 

    MakerMaker

    San Francisco, CA
    1 day ago
  • $300k

     ...leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The...  ..., a deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base... 
    Visa sponsorship
    Relocation package

    Trades Workforce Solutions

    San Francisco, CA
    2 days ago
  • MakerMaker.AI in San Francisco is seeking a skilled Software Engineer to write and optimize GPU kernels. You will work on deep low-level tasks that directly impact the performance of machine learning models. The ideal candidate has over 4 years of experience with GPU kernels... 

    MakerMaker.AI

    San Francisco, CA
    1 day ago
  • A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal... 

    Baseten

    San Francisco, CA
    4 days ago
  •  ...unicorn founders and senior engineers with deep expertise in...  ...Founding Engineer, ML Inference with deep expertise in...  ...inference frameworks, optimizing inference performance,...  ...edge in ultra-low-latency, high-throughput...  ....compile, custom CUDA kernels, and specialized inference... 
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    5 days ago
  • $100k

     ...ultra-long context, and inference-time compute to...  ...About the role:  As a Kernel Engineer, you will design, implement...  ...kernels to optimize throughput and latency...  ...Think beyond the kernel level to the broader scheme...  ...for: Experience with low-level programming of AI... 
    Remote job
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    more than 2 months ago
  •  ...research company in San Francisco is seeking a Systems Engineer focused on kernel optimization and AI-assisted workflows. You'll develop tooling to improve...  ...in performance optimization, particularly in low-level software. Join us in shaping the future of AI development... 

    OpenAI

    San Francisco, CA
    3 days ago
  • $80 - $120 per hour

     ...and Jack Dorsey . Position: CUDA Engineering Expert Type: Contract...  ...Role Responsibilities Analyze and optimize GPU kernels for performance, efficiency, and hardware...  ...least 1 year of professional or graduate-level research experience with GPUs . Strong... 
    Contract work
    Summer work
    Remote work

    Mercor

    San Francisco, CA
    28 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to RLEE - Low-Level Engineering & Kernel Inference Optimization. Be the first to apply!