RLEE - Low-Level Engineering & Kernel Inference Optimization
$90 - $125 per hourOpen Data Science
RLEE - Low-Level Engineering & Kernel Inference Optimization RL Environments Kernel Optimization GPU/CUDA Compilers (LLVM/MLIR) PyTorch Extensions Distributed Inference (vLLM/NCCL) Brief Description of the Role We're hiring Low-Level Engineers to design and build RL environments that teach LLMs kernel development, hardware optimization, and systems programming. The goal is to create realistic feedback loops where models learn to write high-performance code across GPU and CPU architectures. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required. About the Company Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops. Our founding team has previous experience on Anthropic's data team building data infrastructure, tokenizers, and datasets behind the Claude model. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. The company is backed by Tier 1 Silicon Valley VC. Responsibilities Design and build MLE/SWE environments and diverse tasks. Target a specified language model and satisfy the required difficulty distribution. Requirements Minimal Qualifications Strong Python (engineering-quality, not notebook-only) Clear understanding of LLMs, their current limitations Ability to meet throughput expectations and respond quickly to feedback You may be a good fit if one of the following applies Deep understanding of memory hierarchies (registers, L1/L2/shared memory, HBM, system RAM) and their performance implications Threading models, synchronization primitives, and concurrent programming (warps, thread blocks, barriers, atomics) Cache coherence, memory access patterns, coalescing, and bank conflicts AOT compilation and optimization passes (LLVM, MLIR, TVM) Compiler and kernel frameworks such as CUTLASS, BitBLAS, or JAX/Pallas Modern C++, including templates, concurrency, and build systems Assembly-level programming and low-level optimization across GPU and CPU architectures (e.g., x86, ARM, NVIDIA Hopper, NVIDIA Blackwell) Debugging and optimizing GPU kernels using CUDA and/or HIP/ROCm Developing PyTorch custom operators, backend extensions, or dispatcher integrations (e.g., ATen, TorchScript, or custom backends) Customizing, extending, or optimizing vLLM, including distributed inference workflows GPU communication libraries and collectives, such as NVIDIA NCCL, AMD RCCL, MPI, or UCX Mixed-precision and low-precision kernels (e.g., FP16, BF16, FP8, INT8), including numerical stability and performance trade-offs Working conditions Hourly contractor rate: 90- 125 USD/hour (dependent on the expertise level and quality of take-home assignment). Contacts Log In Only registered users can open employer contacts. #J-18808-Ljbffr Open Data Science
$90 - $125 per hour
A cutting-edge AI company is looking for Low-Level Engineers to design RL environments that optimize kernel development and systems programming. Candidates should have strong Python skills and a solid understanding of LLMs. This remote contractor role offers an hourly...SuggestedRemote jobHourly payFor contractors$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About... ...This role will focus on low-latency, high-throughput... ...graph, compiled, efficient kernels Soft Skills:... ...determined by location, level and role. Individual compensation...SuggestedFull time- ...experienced individual to develop low-latency inference pipelines for on-device... ...role involves designing and optimizing distributed systems on GPU... ...implementing efficient low-level code such as CUDA and Triton... ...background, and mastery in kernel optimization. This position...Suggested
- Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...SuggestedFlexible hours
$280k
...committed researchers, engineers, policy experts, and... ...the Role As a TPU Kernel Engineer, you'll be responsible... ..., training, and inference. A significant portion... ...involve designing and optimizing kernels for the TPU.... ...systems problems and low-level optimization. You may...SuggestedWork at officeVisa sponsorshipFlexible hours- ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing... ...role, you will design and optimize custom GPU kernels that... ...hardware–software stack, from low-level kernel development to integrating... ...large-scale training and inference. This role is ideal for someone...Flexible hours
- ...powers mission‑critical inference for the world's most... ...help build the platform engineers turn to to ship AI... ...ROLE We’re seeking a GPU Kernel Engineer to join our... ...modern AI workloads, optimizing every microsecond of computation... ...passionate about low‑level optimization and high‑...Flexible hours
$280k
About The Role As a TPU Kernel Engineer, you'll be responsible for identifying... ...research, training, and inference. A significant portion of... ...will involve designing and optimizing kernels for the TPU. You... ...scale systems problems and low-level optimization. You May Be a...Visa sponsorship- ...ABOUT THE ROLE You'll write and optimize the GPU kernels and supporting systems software that makes our training and inference workloads fast. This is deep, low-level work (performance counters,... ...actually use. We hire kernel engineers because the gap between "this works...Shift work
$225k
Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA...- FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
- MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern...
$100k - $120k
...foundation models. As training and inference workloads grow, we need kernel‑level innovations to reduce latency,... ...s founding team to architect and optimize low‑level compute kernels, drivers, and... ...Lead a team of kernel and system engineers focused on performance-critical...$167.2k - $209k
...DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team. DigitalOcean aims to be... ...the inference engine and GPU kernel layers, ensuring our... ...skills, particularly related to low-level GPU programming - optimization...Local areaRemote workWorldwideFlexible hours$280k
Anthropic is looking for a TPU Kernel Engineer in San Francisco, California. In this role, you will identify and resolve performance... ...ML systems, particularly in research, training, and inference. You will design and optimize TPU kernels and provide critical feedback to...- ...hardware, ensuring low latency, minimal... ...Opportunity Our Edge Inference team compiles... ...Foundation Models into optimized machine code that... ...at the hardware level: You understand cache... ...inference kernels for CPU, NPU, and... ...Embedded software engineering experience or work...
- ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems... ...on the inference systems behind low-latency conversational speech models.... ...stability GPU profiling and identifying kernel-level bottlenecks Optimising TensorRT,...Remote workFlexible hours
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate...- ...software developers of all skill levels. Were commercializing Ray, a... ...About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of... ...providing optimizations achieving low-cost solutions for large scale ML...Work at office
- ...We are searching for a CUDA Kernel Engineer who has hands‑on experience developing and optimizing NVIDIA CUDA kernels from scratch... ...architecture, memory hierarchy, warp‑level execution, and profiling... .... Knowledge of model inference optimization (TensorRT, CUDA Graphs...Remote jobLocal areaImmediate startRelocation package
$285k - $315k
...looking for a Founding GPU Kernel Engineer who lives right at the boundary... ...knowledge into compiler optimization passes that help every model... ...Profile at the microarchitectural level: look into SM utilization,... ...) Strong skills with low-level profiling tools: Nsight...Full timeWork at officeRelocation package- ...Chopping Block, Inc. is looking for a specialized role to model inference performance across application, model, and fleet layers.... ...performance models and analyzing inference workloads to identify and optimize bottlenecks. Ideal candidates will have deep expertise in...
- ...Francisco seeks a Staff Software Engineer to lead kernel-level performance engineering... ...involves designing and optimizing high-performance GPU... ...performance roadmaps for low-level compute paths. Ideal... ...on pushing the frontier of inference performance. #J-18808-Ljbffr...
- ...ML Systems Engineer — Training & Inference Optimization (MBMB) We are building large-scale embodied intelligence... ...ML, hardware-aware, and software-level optimizations that materially... ...performance Work across: CUDA kernels and low-level GPU execution ML model...
$300k
...leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The... ..., a deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base...Visa sponsorshipRelocation package- Acceler8 Talent is looking for a Kernel Engineer in San Francisco, California. The role involves designing and optimizing high-performance kernels to enhance throughput and... ...-scale AI systems. Candidates should have low-level programming experience with AI hardware accelerators...Flexible hours
- A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal...
- ...unicorn founders and senior engineers with deep expertise in... ...Founding Engineer, ML Inference with deep expertise in... ...inference frameworks, optimizing inference performance,... ...edge in ultra-low-latency, high-throughput... ....compile, custom CUDA kernels, and specialized inference...RelocationVisa sponsorshipRelocation package
- ...research company in San Francisco is seeking a Systems Engineer focused on kernel optimization and AI-assisted workflows. You'll develop tooling to improve... ...in performance optimization, particularly in low-level software. Join us in shaping the future of AI development...
$300k
GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits... ...? This team is building low-latency AI systems where... ...understands GPUs at an architectural level. Someone who knows where... ...lost: memory hierarchy, kernel launch overhead, occupancy...RelocationVisa sponsorshipFree visa
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to RLEE - Low-Level Engineering & Kernel Inference Optimization. Be the first to apply!

