Senior Software Engineer, CUDA Deep Learning Systems
$184k - $287.5kDormont Manufacturing Co
We are looking for an experienced and highly motivated software professional to work on pioneering initiatives and projects at the intersection of CUDA and Deep Learning Systems. As the complexity and scale of artificial intelligence continue to grow, the intersection of advanced deep learning architectures, massive‑scale distributed computing, and low‑level hardware optimization has never been more critical. Our team is dedicated to exploring and prototyping next‑generation ideas that bridge the gap between deep learning algorithms and CUDA, pushing the boundaries of what is possible on modern accelerator architectures. Join our dynamic, research‑oriented team to help unlock maximum hardware performance for emerging AI workloads. You will be a crucial member of a highly technical group exploring uncharted territories in model optimization, custom kernel development, and cluster‑scale AI systems design. If you are passionate about the fundamentals of deep learning and thrive on squeezing every ounce of performance out of advanced computing systems from a single GPU to supercomputer clusters, we want you on our team! What you will be doing: Explore, research, and prototype novel systems optimizations for advanced deep learning models at the intersection of high‑level DL frameworks and low‑level CUDA through modeling, simulation, and silicon prototyping. Architect and optimize distributed computing systems that scale seamlessly from a single node to massive, cluster‑scale supercomputing environments. Design, implement, and optimize custom high‑performance CUDA kernels tailored to emerging neural network architectures and workloads. Analyze complex hardware‑software interactions to identify and resolve performance bottlenecks in both training and inference pipelines. Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co‑design systems and algorithms that improve accelerator compute utilization, memory bandwidth, cross‑node network communication efficiency and programmability. Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning. Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open‑source releases, upstream framework integrations, internal tools, or closed‑source commercial products. What we need to see: BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience). 8+ years of relevant industry experience or equivalent academic experience after degree achievement. Strong proficiency in C++ and Python programming. Solid background in the fundamentals of Deep Learning with a focus on transformers. Strong understanding of distributed computing principles, multi‑node scaling, and the unique performance challenges of cluster‑scale execution. Proven experience in systems programming, computer architecture, and low‑level systems performance optimization. Familiarity with deep learning accelerator architectures such as the GPU and hands‑on experience with CUDA programming and kernel optimization. A strong analytical approach with experience using profiling tools to deeply understand software performance on hardware. Experience profiling and optimizing innovative vision models, generative AI architectures, or diffusion models. Background in deep learning compilers, both graph‑level and codegen (e.g., Triton, XLA, torch compile) Ways to stand out from the crowd: Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e.g., PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc.). Hands‑on experience with CUDA, communication libraries (e.g., NCCL, MPI, UCX) and distributed machine learning techniques (e.g., pipeline parallelism, tensor parallelism). Knowledge of numerical methods, low‑precision arithmetic (e.g., NVFP4, MXFP4, FP8, INT8), and their implications on deep learning model accuracy and performance. Familiarity with systems requirements for Reinforcement Learning (RL) or highly parallel simulation environments and/or research background in machine learning systems or adjacent fields. Experience with machine learning, especially agentic systems, applied to systems problems. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr Dormont Manufacturing Co
$184k - $287.5k
We are hiring senior engineers to work on the CUDA driver and runtime, core components... ...investigates bottlenecks in software or hardware and delivers... ...workloads, ranging from deep learning, scientific computation,... ...model across a range of system configurations and hardware...Senior$272k - $431.25k
...We are hiring senior engineers to work on the CUDA driver, a core component of our platform... ...workloads, ranging from deep learning, scientific computation,... ...model across a range of system configurations and... ...years of relevant systems software development experience...Suggested$152k - $241.5k
NVIDIA is seeking outstanding senior engineers to work on the CUDA driver, a key component... ...will join a versatile software engineering team that delivers... ...diverse workloads like deep learning, scientific research,... ...networking software. Your system-level expertise and...Senior$184k - $287.5k
...seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models... ..., distributed systems, deep learning theories. Knowledgeable and passionate... ...programming and performance: CUDA, memory hierarchy, streams,...Senior$184k - $287.5k
...doing: Develop use cases and system requirements for L3 and L4... ...closely with Data Analytics, Test Engineering, and System Integration &... ..., data analysis, and software architecture. Strong software... ...trade-offs between End-to-End deep learning approaches, classical modular...Senior$152k - $241.5k
...NCCL, NVSHMEM, UCX for Deep Learning and HPC. We are... ...motivated Performance engineer to influence the roadmap... ...understanding of computer system architecture, HW-SW interactions... ...(aka systems software fundamentals)... ...deployments Familiarity with CUDA programming and/or GPUs...Senior$184k - $287.5k
Dormont Manufacturing Co is seeking a motivated System Software Engineer in California to work on the CUDA Driver, enhancing NVIDIA's GPU capabilities. This role requires strong C/C++ programming skills and at least 7 years of relevant experience. You will collaborate...Senior$184k - $287.5k
...tools that empower NVIDIA engineers to improve perf and... ...join a multifaceted software team with high standards... ...the workload and the system, and empower them to find... ...Good understanding of Deep Learning frameworks like... ...Experience with NVIDIA GPUs, CUDA Programming and NCCL...Senior- NVIDIA is seeking a candidate to develop and optimize Tensor Core-based deep learning kernels leveraging CUDA C++ and Python. The role requires a strong background in computer science or related fields and expertise in programming for parallel architectures. Join a team...Senior
$152k - $241.5k
...AI and machine learning to solve some of... ...talented and motivated engineers to join our... ...-leading deep learning inference software for NVIDIA AI accelerators... .... As a Senior Software Engineer... ...C++, Python, and CUDA for seamless and... ..., Compilers, or System Software. Excellent...Senior$152k - $241.5k
We are now looking for a Senior Software Engineer for Deep Learning Inference! Would you like to make a big impact... ...the crowd: Experience developing System Software. Proficiency in Python as... ...Background in GPU kernel programming using CUDA or OpenCL. Experience in software...Senior$152k - $241.5k
Overview We are now looking for a Senior Deep Learning Software Engineer, TensorRT Performance! NVIDIA is seeking... ...domain specific languages (e.g. CUDA/TileIR/CuTeDSL/cutlass/Triton). Prior... ...for low-latency, resource-constrained systems or embedded AI pipelines (e.g....Senior$184k - $287.5k
...looking for an experienced software professional to... ...computing, data analytics, deep learning, and professional... ...development of runtime systems that underlay the foundation... ...Math, Electrical Engineering or related field (or equivalent... ...Python, C++ and CUDA programming skills...Senior$184k - $287.5k
...Manufacturing Co is seeking an outstanding AI systems engineer to develop groundbreaking technologies in the inference systems software stack. You'll innovate and develop new AI... ..., and have significant experience with deep learning frameworks. The base salary range for...Senior$184k - $287.5k
A leading tech company is seeking a Senior Performance Engineer in California to enhance AI system performance and datacenter applications. The role requires extensive experience in accelerated computing, deep learning frameworks, and cloud/container architecture. Applicants...Senior- ...looking for highly motivated Senior Software Engineers to work on our GPU Fabric... ...Performance Computing and Deep Learning. What you will be doing:... ...Design, implement and maintain system software that enables... ...the crowd: Understanding of CUDA programming model and NVIDIA...Senior
$184k - $287.5k
...GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service... ...you’ll be doing: Perform deep-dive debugging of multi-... ...experience in distributed systems (Go, Rust, C/C++ or Python... ...Experience with GPU computing (CUDA), deep learning workloads. NVIDIA is...Senior$152k - $287.5k
Dormont Manufacturing Co is seeking a Senior System Software Engineer to join its fast-paced team working on the Dynamo-Triton Inference Server.... ...have a robust background in Rust and C++, experience in deep learning software, and strong communication skills. The base salary...Senior$152k - $241.5k
Dormont Manufacturing Co is seeking a senior engineer to work on the CUDA driver, crucial for accelerating computations on NVIDIA hardware. You will engage in development across the entire product lifecycle, from design to deployment. Candidates should possess strong programming...Senior$152k - $241.5k
NVIDIA Solutions Engineering team is searching for engineers to help develop and bring NVIDIA... ...art technologies alongside experts in Deep Learning, Computer Vision, and vehicle control... ...their best work. We are looking for a System Software Engineer with expertise in embedded...Senior- ...accelerating computing. As a Senior Staff Software Engineer, you will be instrumental... ...NVIDIA platforms for AI, deep learning, scientific computing, and... ...and distributed systems . Strong problem-solving skills... ...The Crowd Experience with CUDA or other GPU programming models...Senior
- NVIDIA is seeking an experienced Senior Software Engineer for the cuEquivariance team.... ...on building and optimizing CUDA kernels and delivering high-... ...for geometric machine learning across NVIDIA GPUs. The ideal... ...experience in software engineering, deep proficiency in C++ and...Senior
$224k - $431k
Senior Graphics System Software Engineer $224K - $431K NVIDIA is searching for an outstanding Senior Graphics System Software Engineer to expand the... ...from a wide range of backgrounds like gaming, deep learning, application and system development and hardware design...SeniorWork experience placement$184k - $287.5k
...UCX that are crucial for scaling Deep Learning and HPC. We’re seeking a Senior Software Architect to help co-design next-... ...increasing scale of next generation systems. This is an outstanding... ...for what you do. Experience with CUDA programming and NVIDIA GPUs. Knowledge...Senior$224k - $356.5k
We are looking for a Senior Deep Learning Software Engineer to design and build our automated inference and deployment solution. As part of the team, you... ...GPU optimizations and developing custom GPU kernels in CUDA and/or Triton. This is an exceptional opportunity for...Senior$184k - $287.5k
Reinforcement learning post-training is driving some... ...limits of what distributed systems can do. NVIDIA is building an RL Frameworks engineering team to develop the... ...team spans the full software stack, from collaborating... ...their need optimizing deep learning frameworks,...Senior$152k - $241.5k
NVIDIA is looking for talented senior engineers to work on the CUDA driver, crucial for GPU computing. Join a dynamic software engineering team where your creativity in problem-solving... ...or a related field and 5+ years in systems software are essential. This role offers...Senior$184k - $287.5k
...biological discovery. We are looking for a Senior Software Engineer to join the cuEquivariance team — an NVIDIA... ...software interfaces that power equivariant deep learning throughout the scientific field. The work spans CUDA kernel engineering, Python library development...Senior$224k - $356.5k
Senior Software Engineer - Agentic Memory page is loaded## Senior Software Engineer... ...join a team of researchers with deep experience in building information retrieval systems, who are now working on... ...understanding of the Python deep learning ecosystem.* An ability to...SeniorWork at officeRemote workFlexible hours$152k - $241.5k
We are looking for a Senior System Software Engineer to work on Dynamo‑Triton Inference Server. NVIDIA is hiring software engineers for its GPU‑accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Software Engineer, CUDA Deep Learning Systems. Be the first to apply!
- software engineer amazon California, MO
- experienced software developer California, MO
- federal - software developer California, MO
- software developer internship California, MO
- senior software engineer California, MO
- software developer fintech California, MO
- part time software developer remote California, MO
- software developer intern California, MO
- software data engineer California, MO
- software engineer California, MO


