Senior Deep Learning Architect, LLM Inference

$184k - $287.5k

NVIDIA

Senior Deep Learning Architect, LLM Inference

NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team specifically focuses on inference server performance optimization for Large Language Models (LLMs). If you're passionate about pushing the boundaries of GPU hardware and software performance and understand terms like disaggregated serving, data parallel attention, MoE, Qwen3.5, DeepSeek, GPT-OSS, then this is a great role for you!

What you'll be doing:

You will do workload characterization of the latest LLMs and inference servers like vLLM, SGLang and TRT-LLM to ensure NVIDIA maintains its leadership position.
Join forces with the performance marketing team to build engaging content, including blog posts and updates to InferenceX to highlight NVIDIA's outstanding inference achievements.
Collaborate with engineers from AI startup companies to establish standard benchmarking methodologies.
Develop a constantly evolving inference performance data results website.
Invent E2E profiling and analysis tools that you will use to keep up with the rapid pace of Generative AI.
Contribute to deep learning software projects, such as PyTorch, TRT-LLM, vLLM, and SGLang to drive advancements in the field.
Verify that new GPU product launches produce industry leading performance.
Collaborate across the company to guide the direction of inference serving, working with software, research, and product teams to ensure best-in-class performance.
Use the latest coding agents and inference technology to improve team efficiency.

What we need to see:

Master's or PhD degree in Computer Science, Computer Engineering, related fields, or equivalent experience.
6+ years of relevant software development experience.
Detailed knowledge of deep learning inference serving, PyTorch programming, profiling, and compiler optimizations.
Experience developing client server LLM applications with OpenAI API or MCP and identifying performance bottlenecks.
Solid understanding of CPU and GPU microarchitecture and performance characteristics.
Experience with complex software projects like frameworks, compilers, or operating systems.
Demonstrated proficiency with the latest AI coding agents like Claude Code, Codex, and Cursor
Excellent written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out from the crowd:

Demonstrate a drive to continuously improve software and hardware performance.
Showcase examples of novel use cases for agentic AI tools in the workplace.
Experience with databases and visualization tools will set you apart.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have a team of highly skilled and motivated individuals who excel in their work. If you have a proactive and independent approach, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until March 3, 2026.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Senior Deep Learning Architect, LLM Inference in Santa Clara, CA vacancy

LLM Inference Architect - GPU Performance & Tools (Equity)
$124k - $241.5k
NVIDIA Corporation is seeking a Deep Learning Architect specializing in LLM Inference for New College Grads in Santa Clara, California. The role involves optimizing performance for Large Language Models, developing profiling tools, and collaborating on benchmarking methodologies...
Suggested
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior LLM Performance Engineer - GPU Inference
$184k - $356.5k
A leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge algorithms...
Senior
Full time
NVIDIA Corporation
Santa Clara, CA
4 days ago
Principal AI Inference Architect - LLM Serving
...in Santa Clara seeks a Principal Software Engineer - AI Inference to advance open-source LLM serving. This hands-on role focuses on optimizing inference... ...engines like vLLM and SGLang for NVIDIA GPUs, requiring deep technical skill and collaboration across teams. The ideal...
Suggested
NVIDIA Corporation
Santa Clara, CA
15 hours ago
Senior Deep Learning Performance Architect
$184k - $287.5k
...We are now looking for a Senior Deep Learning Performance Architect! NVIDIA is seeking outstanding Performance Architects to help analyze and develop... ...large-scale deep learning workloads, especially LLM inference/training in real-world deployments. Build and use...
Senior
NVIDIA
Santa Clara, CA
4 days ago
Senior Deep Learning Communication Architect
$184k - $287.5k
...architecture group at NVIDIA has openings for a Deep Learning Communication Architect. We scale the DNN models and training/inference frameworks to systems with hundreds of... ...Experience in evaluating, analyzing, and optimizing LLM training and inference performance of state-...
Senior
Work experience placement
NVIDIA Corporation
Santa Clara, CA
4 days ago
Senior Deep Learning Computer Architect
$184k - $287.5k
...We are now looking for a Senior Deep Learning Computer Architect! NVIDIA is seeking architects like you to help design hardware accelerator and processor... ...analysis and optimization; ~ Experience with LLM workloads, including performance tuning considerations such...
Senior
Night shift
NVIDIA
Santa Clara, CA
15 hours ago
Senior GPU Architect, Deep Learning
$184k - $287.5k
...We are now looking for a Senior GPU & Deep Learning Architect! The NVIDIA GPU Architecture group is looking for world class architects and software... ...for deep learning workloads, both training and inference, and maintain our leadership by developing new parallel...
Senior
NVIDIA
Santa Clara, CA
2 days ago
Senior LLM Training Performance Architect
$184k - $356.5k
NVIDIA is seeking a Senior High-Performance LLM Training Engineer to enhance the efficiency of LLM training workloads. Focused on optimizing NVIDIA... ...PhD or equivalent degree with substantial experience in deep learning, GPU architecture, and performance optimization. The base...
Senior
NVIDIA
Santa Clara, CA
4 days ago
Senior CI Architect — GPU Inference & Open-Source Infra
...innovative infrastructure company is seeking a Member of Technical Staff — CI Engineer to improve CI reliability for their open-source LLM inference engine. The role requires 3+ years' experience in CI/CD, knowledge of Linux and GPU computing, as well as strong skills in Bash...
Senior
RadixArk
Palo Alto, CA
3 days ago
Senior Deep Learning Engineer - LLM Performance & Optimization
$184k - $287.5k
NVIDIA is looking for a Senior Deep Learning Software Engineer in Santa Clara, California. This role involves analyzing and improving LLM inference performance using NVIDIA GPUs. Candidates should have extensive software development experience, strong skills in Python/C++...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior Software Engineer, Machine Learning Inference
$152k - $241.5k
...advancements in AI and machine learning to solve some of the... ...the industry-leading deep learning inference software for NVIDIA... .... As a Senior Software Engineer in... ...TensorRT and TensorRT-LLM to supercharge inference... ...learning experts and GPU architects throughout the...
Senior
NVIDIA
Santa Clara, CA
15 hours ago
Principal Deep Learning Communication Architect
$272k - $431.25k
...with application developers to architect and implement specialized... ...computing (HPC) or distributed deep learning. Parallelism Expertise: Deep... ...verbs is required. Inference & Serving: Advanced knowledge... ...schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo...
NVIDIA
Santa Clara, CA
2 days ago
Senior Solutions Architect, GPU Performance and LLM - Cloud Service Providers
$184k - $287.5k
...We are seeking an expert Solutions Architect to assist customers in building AI/ML... ...aspects related to tasks like large scale LLM training and inference. Conducting regular technical... ...crowd: Hands-on experience with Deep Learning frameworks (PyTorch, JAX, etc.),...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Senior Software Development Engineer - SGLang and Inference Stack
...you will play a pivotal role in optimizing and developing deep learning frameworks for AMD GPUs. Your work will be instrumental... ...deep learning models, and enabling RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node systems. You...
Senior
Advanced Micro Devices , Inc.
Santa Clara, CA
4 days ago
Senior System Software Engineer - Dynamo-Triton Inference Server
$152k - $241.5k
...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring software engineers... ...for its GPU-accelerated deep learning software team. Academic and... ...serve both Large Language Model (LLM) and non-LLM workloads. Be...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior Deep Learning Software Engineer, LLM Performance
$184k - $287.5k
...We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is seeking an experienced Deep Learning Engineer passionate about analyzing and improving the performance of LLM inference! NVIDIA is rapidly growing our research and development...
Senior
NVIDIA
Santa Clara, CA
2 days ago
Senior Deep Learning Software Engineer, Inference
$184k - $287.5k
...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize... ...driving performance improvements for state-of-the-art LLM and Generative AI models across NVIDIA...
Senior
Remote work
NVIDIA
Santa Clara, CA
8 days ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...seeking a Principal ML Architect to lead the design and... ...SLMs and advanced machine learning techniques . This role requires deep expertise in model... ...environments Optimize inference systems for low latency... ...architectures and modern LLM techniques Retrieval-...
Flexible hours
Proofpoint
Sunnyvale, CA
4 days ago
Senior Software Engineer, AI Inference Systems
$184k - $287.5k
...engineers to join us and build AI inference systems that serve large-... ...extreme efficiency. You’ll architect and implement high-performance... ...programming, distributed systems, deep learning theories. Knowledgeable... ...building and optimizing LLM inference engines (e.g., vLLM...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior Product Architect, Storage
$224k - $356.5k
...As an AI Storage Platform Architect at NVIDIA, this position will... ...architectures for disaggregated inference (aligned with NVIDIA Dynamo),... ...or equivalent experience). ~ Deep expertise in AI infrastructure... ...disaggregated inference architectures, LLM training pipelines, and...
Senior
Remote work
NVIDIA
Santa Clara, CA
6 days ago
Senior HPC Architect, Automation and At-Scale Deployment
$184k - $287.5k
...and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is... ...today! We are looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of...
Senior
NVIDIA
Santa Clara, CA
3 days ago
Senior AI Architect
...Senior AI Architect Nexxa.ai is building artificial super intelligence for heavy industries —... ...environments. Our mission is to translate deep technical breakthroughs into... ...experience in software engineering, machine learning, data science, or closely related technical...
Senior
Nexxa.ai
Sunnyvale, CA
15 hours ago
Senior DL Communication & Scaling Architect
A leading technology firm in Santa Clara is seeking a Deep Learning Communication Architect to optimize communication performance in deep learning systems. The role involves designing efficient protocols, collaborating with hardware teams, and developing innovative solutions...
Senior
NVIDIA Corporation
Santa Clara, CA
4 days ago
Senior iOS AI Architect — On-Device LLM Systems
...leading technology company in Cupertino is seeking a Senior Software Engineer to develop innovative AI-powered iOS... ...role involves collaborating with ML researchers and architecting scalable software for on-device LLM integration, impacting how users interact with AI daily...
Senior
Apple Inc.
Cupertino, CA
2 days ago
Senior Deep Learning Hardware Architect — AI Accelerators (Hybrid)
A leading technology firm is seeking a Senior Deep Learning Computer Architect in Santa Clara, CA. The role involves designing hardware architectures to support advanced AI applications and requires a strong background in computer architecture and deep learning. Ideal candidates...
Senior
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior FPGA Architect for AI Inference & Secure Boot
d-Matrix inc. in Santa Clara, CA is seeking a skilled individual for FPGA design and verification for AI solutions. The role involves collaborating with teams to meet project specifications and implementing robust hardware and software modules. The ideal candidate has a...
Senior
d-Matrix inc.
Santa Clara, CA
1 day ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...are seeking a Principal ML Architect to lead the design and... ...SLMs and advanced machine learning techniques . This role requires deep expertise in model architecture... ...environments Optimize inference systems for low latency,... ...architectures and modern LLM techniques Retrieval‑...
Flexible hours
Proofpoint
Sunnyvale, CA
1 day ago
Senior Performance Architect, Nemotron
$152k - $241.5k
Senior Performance Architect, Nemotron page is loaded## Senior Performance... ...of AI systems through deep model-system-hardware co... ...Decoding, Agentic Pipelines, Inference-time compute scaling,... ...* Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang* A Growth...
Senior
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior GPU Functional Modeling Architect
$152k - $241.5k
...science. Today, NVIDIA’s GPU simulates human intelligence, running deep learning algorithms and acting as the brain of computers, robots and... ...our team!NVIDIA Architecture Modeling group is looking for Architects, Functional Modeling Engineers, and Simulation experts to...
Senior
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior GPU AI Inference Engineer - Triton & Dynamo
A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative approach...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Deep Learning Architect, LLM Inference. Be the first to apply!