Senior Software Engineer, RL Post-Training Frameworks

NVIDIA Gruppe

Overview Reinforcement learning post‑training is driving some of the most significant capability gains in AI today. It is the process that teaches a model to reason through hard problems, follow complex instructions, and act as an autonomous agent. It is also one of the hardest infrastructure challenges in the field. RL requires inference, rollout generation, and training running in a continuous loop. The rollout step is what makes it hard: the model must interact with environments, tools, and other models to produce the signal that drives learning. Coordinating actor, critic, and reward models across heterogeneous hardware at scale pushes the limits of what distributed systems can do. NVIDIA is building an RL Frameworks engineering team to develop the open‑source tools and infrastructure that AI researchers and post‑training teams depend on. The team spans the full software stack, from collaborating closely with the researchers and labs pushing the frontier, to contributing to RL frameworks like VeRL, Miles, and TorchTitan, to improving the distributed runtimes they depend on, including Ray and Monarch. Whether your strength is working with researchers to understand and address their need optimizing deep learning frameworks, or building distributed infrastructure, we want to hear from you. Come join us to build the systems that enable the next generation of AI. Responsibilities What you will be doing: Architect and build RL post‑training infrastructure that scales efficiently from experimentation on a single GPU to production across thousands of nodes. This means tuning RL training‑inference‑rollout loops on GPUs, CPUs, and LPUs for performance where it matters, contributing to and improving the performance and usability of open‑source RL frameworks, and partnering with the teams who own them. Span fault tolerance, elastic scaling, and fast restarts so long‑running distributed training jobs survive failures, stragglers, and resource contention. Partner with teams building CPU‑driven rollout workloads, including tool‑use, code execution, and agentic environments, supplying the systems and framework engineering needed to run them efficiently alongside GPU‑ or LPU‑accelerated generation and GPU‑accelerated training. Advocate for researcher and partner needs with NVIDIA’s networking, math library, and compiler teams so the capabilities RL workloads require get prioritized and delivered, and work with hardware teams to take advantage of next‑generation hardware capabilities in post‑training workloads. Qualifications What we need to see: MS or PhD in Computer Science, Computer Engineering, or a related field (or equivalent experience) 5+ years of professional experience in distributed systems, high‑performance computing, deep learning infrastructure, or ML systems engineering Strong proficiency in Python and C/C++ Demonstrated experience building or contributing to large‑scale distributed systems or runtime frameworks in production at a frontier AI lab, hyperscaler, or major technology company Strong verbal and written communication skills and the ability to collaborate across organizational and geographic boundaries Depth in one or more of the following technical areas: Reinforcement learning for LLM post‑training (RLHF, PPO, GRPO, DPO, reward modeling), including how algorithms map to distributed execution and the systems challenges they create (heterogeneous placement, rollouts, environment execution, resharding between training and generation) PyTorch internals, including distributed training primitives (FSDP, tensor parallelism, pipeline parallelism) and their composition Kubernetes runtime internals (container lifecycle, pod scheduling, resource quotas, GPU allocation) End‑to‑end distributed systems design (service boundaries, data flows, consistency models, failure modes, recovery approaches) Nice to Have Experience in any of the following areas is a plus: Deep expertise in networking (NCCL, NVLink, InfiniBand), advanced multi‑dimensional parallelisms (Megatron‑LM, FSDP2, TP/DP/PP, MoE), or memory optimizations (quantization‑aware training, mixed precision) Experience integrating high‑performance inference engines (vLLM, SGLang, TensorRT‑LLM) into RL training loops for GPU‑accelerated rollout Strong background in actor‑ and task‑based distributed programming (Ray, Monarch, or comparable systems) Familiarity with multi‑turn training, multi‑agent co‑evolution, or VLM post‑training Ways to Stand Out Ways to stand out from the crowd: Open‑source contributions to RL post‑training or distributed training projects (e.g., VeRL, Miles, TorchTitan, OpenRLHF, NeMo‑Aligner, DeepSpeed‑Chat), including significant work on framework internals where applicable Kubernetes work beyond routine operations (custom operators, GPU device plugins, or scheduling contributions) Direct experience operating frontier‑scale training (RL post‑training at thousands of GPUs and/or large‑scale LLM or multimodal pre‑training) Hands‑on experience with production distributed failures at scale (stragglers, resource contention, hardware faults) Benefits NVIDIA offers highly competitive salaries and a comprehensive benefits package. You will also be eligible for equity and benefits. EEO Statement NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr

Apply

Vacancy posted 4 hours ago

Similar jobs that could be interesting for youBased on the Senior Software Engineer, RL Post-Training Frameworks in Santa Clara, CA vacancy

Senior Software Engineer, Post-Training & RL Frameworks
$204k - $259k
...states. The Waymo ML Frameworks & Efficiency team... ...our autonomous driving software. We help our partners... ..., including pre-training and post-training. They are geared... ...We are looking for engineers with ML system expertise... ...learning (RL), building systems that...
Senior
Training
Full time
Remote work
Waymo
Mountain View, CA
4 days ago
Senior Software Engineer, AI Frameworks
$152k - $241.5k
...We are seeking a Senior Software Engineer to drive integration of the NVIDIA Grove... ...of leading open-source AI frameworks. In this role, you will... ...features to work smoothly across training and inference stacks.... ...until April 3, 2026. This posting is for an existing vacancy....
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
5 hours ago
Senior Deep Learning Frameworks CUDA Software Engineer
$184k - $287.5k
...motivated Deep Learning engineer to bring advanced CUDA... ...demands, ranging from training on scales up to 100K... ...Runtime abstractions in AI frameworks: from PoC to... ...principles (aka systems software fundamentals) ~ Adaptability... ...May 18, 2026. This posting is for an existing...
Senior
Training
Remote work
NVIDIA
Santa Clara, CA
4 days ago
Senior Software Engineer, Quantized Inference
$152k - $241.5k
...Senior Software Engineer, Quantized Inference page is loaded## Senior... ...Full timeposted on: Posted Yesterdayjob... ...data drawn from SFT/RL pipelines.Each new recipe... ...: CI, build systems, training infrastructure, pipeline... ...export) or equivalent framework* Experience reading,...
Senior
Training
NVIDIA
Santa Clara, CA
5 hours ago
Senior Software Engineer, Simulation Engine
$125k - $191.7k
...the autonomous driving software stack before it... ...roads. As a software engineer on the Simulation Engine... ...development of our simulation frameworks that enable high-... ...reinforcement learning model training. We are looking for an... ...frameworks, and RL model training. Required...
Senior
Training
Remote work
Flexible hours
General Motors
Sunnyvale, CA
4 hours ago
Senior Software Development Engineer - SGLang and Inference Stack
...developing deep learning frameworks for AMD GPUs. Your... ...models, and enabling RL training and SOTA LLM and Multimodal... ...across internal GPU software teams and engage with... .... THE PERSON: Skilled engineer with strong technical... ...available here. This posting is for an existing vacancy...
Senior
Training
Advanced Micro Devices , Inc.
Santa Clara, CA
5 hours ago
Senior Software Engineer - AI Frameworks
$119.8k - $234.7k
...Overview The AI Frameworks team at Microsoft accelerates... ...accelerators and GPUs. We build software across the stack,... ...are seeking a self-motivated Senior Software Engineer - AI Frameworks who thrives... ...accelerate LLM inference and training workloads. Collaborate...
Senior
Training
Ongoing contract
Local area
Microsoft Corporation
Mountain View, CA
1 day ago
Senior Software Engineer, Simulation Engine
$148k - $226.2k
...autonomous driving software stack before it... ...As a software engineer on the Simulation... ...of our simulation frameworks that enable high-fidelity... ...reinforcement learning model training. We are looking... ...frameworks, and RL model training.... ...Now" on the job posting of interest....
Senior
Training
Local area
Remote work
Work from home
Flexible hours
General Motors
Sunnyvale, CA
2 days ago
Senior Software Developer - Test Framework
...Senior Software Developer - Test Framework Work Arrangement: Hybrid: This position does not require an employee... ...it. We are looking for energetic engineers to help us deliver high quality... ...a technical mentor to deliver the training and guidance required to develop a...
Senior
Training
Full time
Work at office
Relocation package
Flexible hours
3 days per week
General Motors
Sunnyvale, CA
5 days ago
Senior Software Engineer, AI Networking
$152k - $241.5k
...NVIDIA seeks a senior software engineer to join the AI Networking... ...particularly within LLM training and inference stacks.... ..., machine learning frameworks, and communication and... ...reinforcement learning, offline RL, supervised learning)... ...18, 2026. This posting is for an existing...
Senior
Training
NVIDIA
Santa Clara, CA
2 days ago
Senior Software Engineer II, Applied Training
$139k - $204k
...Senior Software Engineer II, Applied Training CoreWeave is The Essential Cloud for AI™. Built... ...the backend team. When an RL training run needs to... ...running popular OSS training frameworks on CoreWeave. The work... ...Offer The range we've posted represents the typical compensation...
Senior
Training
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
CoreWeave
Sunnyvale, CA
3 days ago
Senior Software Engineer, Humanoid Robotics
$152k - $241.5k
...NVIDIA's Solution Engineering team that is shaping... ...the application software architecture by working... ...middleware frameworks. ~ Experience with... ...reinforcement learning for training and validation of... ...-real transfer of RL policies being a... ...21, 2026. This posting is for an existing...
Senior
Training
NVIDIA
Santa Clara, CA
3 days ago
Senior Technical Lead- End-to-End AI Training Framework
$240k - $320k
...for experts for product engineering of AI-based Autonomous... ...Description As the Senior Principal Engineer, E2E AI Training Framework for Autonomous Driving... ...years of experience in software development and system... ...ranges included in the postings, when included, generally...
Senior
Training
Full time
Work experience placement
Local area
Flexible hours
Bosch USA
Sunnyvale, CA
17 days ago
Senior ML & Simulation Engineer for AV RL at Scale (Equity)
$224k - $356.5k
...NVIDIA Gruppe in Santa Clara is seeking exceptional Senior Machine Learning and Simulation Engineers for their Autonomous Vehicles (AV) Simulation... ...will lead the design and development of large-scale RL training frameworks to enhance multi-modal AV models, focusing on...
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
5 hours ago
Senior RL Post-Training Infra Architect
NVIDIA Gruppe is looking for an RL Frameworks Engineer in Santa Clara, California, to architect and build scalable RL post-training infrastructure. You will ensure efficient scaling from single GPU experimentation to production across thousands of nodes, while collaborating...
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior Deep Learning Software Engineer
$224k - $356.5k
...We are looking for a Senior Deep Learning Software Engineer to design and build our automated inference and... ...developing features in high-level frameworks like PyTorch and JAX, designing and... ...scalable platform to seamlessly bridge training and deployment workflows—enabling...
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
5 hours ago
Senior Software Engineer, AI Performance Analysis
$168k - $270.25k
...Architecture Group is seeking a senior software engineer to automate and optimize... ...analysis workflows for AI training and inference workloads.... ...profiling infrastructure and AI frameworks and workflows.... ...until May 19, 2026. This posting is for an existing vacancy....
Senior
Training
Work experience placement
NVIDIA
Santa Clara, CA
4 days ago
Senior Software Engineer, AI and DL Kernel Libraries
$184k - $287.5k
...for outstanding AI systems engineers to develop groundbreaking technologies... ...in the inference systems software stack! We build innovative... ...NVIDIA across deep learning frameworks, libraries, kernels, and GPU... ...for LLM inference and training (e.g. FlashInfer, Flash Attention...
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
5 hours ago
Senior Deep Learning Software Engineer, Inference
$184k - $287.5k
...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team... ...high-performance deep learning frameworks, including SGLang and vLLM, which are... ...experience is a plus. Prior experience with training, deploying or optimizing the...
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
4 hours ago
Senior Software Engineer, Deep Learning - MLIR TRT
$184k - $287.5k
...We are looking for outstanding Senior Deep Learning Software Engineers to develop and productize NVIDIA's deep... ...workflows that let users leverage frameworks (e.g. PyTorch, JAX) and compiler technology... ...optimization for deep learning training or inference. Strong C/C++...
Senior
Training
Work experience placement
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Software Engineer, JAX
$184k - $287.5k
...Overview NVIDIA is hiring senior engineers to develop its AI platform and... ...optimizations in deep learning frameworks using JAX, a tool that can... ...platform to handle data, training and analysis for a wide range... ...numeric libraries, modular software design. Highly motivated with...
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
6 hours ago
Senior Software Engineer, CUTLASS Performance
$152k - $241.5k
...industries. Within our software stack, CUTLASS stands... ...models’ inference and training passes to identify key... ...GPU architecture, DL frameworks, and QA as the performance... ...Science, Computer Engineering, or related field (or... ...until June 5, 2026. This posting is for an existing...
Senior
Training
NVIDIA
Santa Clara, CA
5 days ago
Senior Software Engineer, DL Compilers
$184k - $287.5k
...modern AI infrastructure, from training large-scale models to running inference... .... That position depends on software as much as hardware, and compiler engineering is a big part of what makes it... ...generation for DL compiler and framework integration. Building MLIR-based...
Senior
Training
Work experience placement
NVIDIA Gruppe
Santa Clara, CA
5 hours ago
Senior Software Engineer - AI Research Clusters
$152k - $241.5k
.... We are now looking for a Senior Software Engineer to help accelerate the next... ...researchers, enable them to focus on training and development by reducing... ...Slurm or custom scheduling frameworks in production ML... ...least until May 4, 2026. This posting is for an existing vacancy....
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
6 hours ago
Senior Software Engineer - CUDA Driver
...NVIDIA is seeking outstanding senior engineers to work on the CUDA driver,... .... You will join a versatile software engineering team that... ...with distributed system and training/inference patterns (data/model... ...parallelism) and deep learning frameworks Compensation: Your base salary...
Senior
Training
NVIDIA Gruppe
Santa Clara, CA
5 hours ago
Senior Software Engineer, AV Frameworks
$154.42k - $235.9k
...Job Description We’re AV Frameworks within AV Platform Core at General Motors. We build the foundational frameworks that power... ...make complex systems reliable, observable, and fast. As a Senior Software Engineer, you will design and deliver the core communication and runtime...
Senior
Permanent employment
Local area
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
5 hours ago
Senior Software Engineer, Metropolis Vision AI
$224k - $356.5k
...real-world impact. As a System Software Engineer for Vision AI, you will... ...experience with PyTorch in training, fine-tuning, and deploying... ...optimization for inference and pre/post-processing.Experience in... ..., video pipelines, or media frameworks) and integrating vision models...
Senior
Training
NVIDIA
Santa Clara, CA
5 hours ago
Senior ML Software Engineer - Integration & Quality
...Cerebras to deliver industry‑leading training and inference speeds and empowers machine... .... About the Role We are looking for a Software Engineer to join the ML Integration and Quality... ...building automation tools, testing frameworks, or internal developer tooling. Strong...
Senior
Training
Work at office
Remote work
Dormont Manufacturing Company
Sunnyvale, CA
6 hours ago
Senior Software Development Engineer - LLM Inference Framework
.... THE ROLE: As a senior member of the LLM inference framework team, you will be responsible... ...of inference engines, distributed systems, and... ...kernel development Software Engineering ~ Expertise... ...available here. This posting is for an existing vacancy...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
2 days ago
Senior Software Engineer (Python)
...Club/ Walmart Job Title: Senior Software Engineer (Python) Location: Sunnyvale,... ...integration patterns. ~ Exposure to AI/ML frameworks such as PyTorch, TensorFlow, or... ...system integration rather than model training. ~ Good understanding of system...
Senior
Training
Anveta
Sunnyvale, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer, RL Post-Training Frameworks. Be the first to apply!