Senior Deep Learning Architect, LLM Inference
$184k - $287.5kNVIDIA
Senior Deep Learning Architect, LLM Inference
NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team specifically focuses on inference server performance optimization for Large Language Models (LLMs). If you're passionate about pushing the boundaries of GPU hardware and software performance and understand terms like disaggregated serving, data parallel attention, MoE, Qwen3.5, DeepSeek, GPT-OSS, then this is a great role for you!
What you'll be doing:
- You will do workload characterization of the latest LLMs and inference servers like vLLM, SGLang and TRT-LLM to ensure NVIDIA maintains its leadership position.
- Join forces with the performance marketing team to build engaging content, including blog posts and updates to InferenceX to highlight NVIDIA's outstanding inference achievements.
- Collaborate with engineers from AI startup companies to establish standard benchmarking methodologies.
- Develop a constantly evolving inference performance data results website.
- Invent E2E profiling and analysis tools that you will use to keep up with the rapid pace of Generative AI.
- Contribute to deep learning software projects, such as PyTorch, TRT-LLM, vLLM, and SGLang to drive advancements in the field.
- Verify that new GPU product launches produce industry leading performance.
- Collaborate across the company to guide the direction of inference serving, working with software, research, and product teams to ensure best-in-class performance.
- Use the latest coding agents and inference technology to improve team efficiency.
What we need to see:
- Master's or PhD degree in Computer Science, Computer Engineering, related fields, or equivalent experience.
- 6+ years of relevant software development experience.
- Detailed knowledge of deep learning inference serving, PyTorch programming, profiling, and compiler optimizations.
- Experience developing client server LLM applications with OpenAI API or MCP and identifying performance bottlenecks.
- Solid understanding of CPU and GPU microarchitecture and performance characteristics.
- Experience with complex software projects like frameworks, compilers, or operating systems.
- Demonstrated proficiency with the latest AI coding agents like Claude Code, Codex, and Cursor
- Excellent written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.
Ways to stand out from the crowd:
- Demonstrate a drive to continuously improve software and hardware performance.
- Showcase examples of novel use cases for agentic AI tools in the workplace.
- Experience with databases and visualization tools will set you apart.
NVIDIA is widely considered to be one of the technology world's most desirable employers. We have a team of highly skilled and motivated individuals who excel in their work. If you have a proactive and independent approach, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until March 3, 2026.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
$124k - $241.5k
NVIDIA Corporation is seeking a Deep Learning Architect specializing in LLM Inference for New College Grads in Santa Clara, California. The role involves optimizing performance for Large Language Models, developing profiling tools, and collaborating on benchmarking methodologies...Suggested$184k - $356.5k
A leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge algorithms...SeniorFull time- ...in Santa Clara seeks a Principal Software Engineer - AI Inference to advance open-source LLM serving. This hands-on role focuses on optimizing inference... ...engines like vLLM and SGLang for NVIDIA GPUs, requiring deep technical skill and collaboration across teams. The ideal...Suggested
$184k - $287.5k
...We are now looking for a Senior Deep Learning Performance Architect! NVIDIA is seeking outstanding Performance Architects to help analyze and develop... ...large-scale deep learning workloads, especially LLM inference/training in real-world deployments. Build and use...Senior$184k - $287.5k
...architecture group at NVIDIA has openings for a Deep Learning Communication Architect. We scale the DNN models and training/inference frameworks to systems with hundreds of... ...Experience in evaluating, analyzing, and optimizing LLM training and inference performance of state-...SeniorWork experience placement$184k - $287.5k
...We are now looking for a Senior Deep Learning Computer Architect! NVIDIA is seeking architects like you to help design hardware accelerator and processor... ...analysis and optimization; ~ Experience with LLM workloads, including performance tuning considerations such...SeniorNight shift$184k - $287.5k
...We are now looking for a Senior GPU & Deep Learning Architect! The NVIDIA GPU Architecture group is looking for world class architects and software... ...for deep learning workloads, both training and inference, and maintain our leadership by developing new parallel...Senior$184k - $356.5k
NVIDIA is seeking a Senior High-Performance LLM Training Engineer to enhance the efficiency of LLM training workloads. Focused on optimizing NVIDIA... ...PhD or equivalent degree with substantial experience in deep learning, GPU architecture, and performance optimization. The base...Senior- ...innovative infrastructure company is seeking a Member of Technical Staff — CI Engineer to improve CI reliability for their open-source LLM inference engine. The role requires 3+ years' experience in CI/CD, knowledge of Linux and GPU computing, as well as strong skills in Bash...Senior
$184k - $287.5k
NVIDIA is looking for a Senior Deep Learning Software Engineer in Santa Clara, California. This role involves analyzing and improving LLM inference performance using NVIDIA GPUs. Candidates should have extensive software development experience, strong skills in Python/C++...Senior$152k - $241.5k
...advancements in AI and machine learning to solve some of the... ...the industry-leading deep learning inference software for NVIDIA... .... As a Senior Software Engineer in... ...TensorRT and TensorRT-LLM to supercharge inference... ...learning experts and GPU architects throughout the...Senior$272k - $431.25k
...with application developers to architect and implement specialized... ...computing (HPC) or distributed deep learning. Parallelism Expertise: Deep... ...verbs is required. Inference & Serving: Advanced knowledge... ...schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo...$184k - $287.5k
...We are seeking an expert Solutions Architect to assist customers in building AI/ML... ...aspects related to tasks like large scale LLM training and inference. Conducting regular technical... ...crowd: Hands-on experience with Deep Learning frameworks (PyTorch, JAX, etc.),...Senior- ...you will play a pivotal role in optimizing and developing deep learning frameworks for AMD GPUs. Your work will be instrumental... ...deep learning models, and enabling RL training and SOTA LLM and Multimodal inference at scale across multi-GPU and multi-node systems. You...Senior
$152k - $241.5k
...We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server ( . NVIDIA is hiring software engineers... ...for its GPU-accelerated deep learning software team. Academic and... ...serve both Large Language Model (LLM) and non-LLM workloads. Be...Senior$184k - $287.5k
...We are now looking for a Senior Deep Learning Software Engineer, LLM Performance! NVIDIA is seeking an experienced Deep Learning Engineer passionate about analyzing and improving the performance of LLM inference! NVIDIA is rapidly growing our research and development...Senior$184k - $287.5k
...NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize... ...driving performance improvements for state-of-the-art LLM and Generative AI models across NVIDIA...SeniorRemote work$254k - $349.25k
...seeking a Principal ML Architect to lead the design and... ...SLMs and advanced machine learning techniques . This role requires deep expertise in model... ...environments Optimize inference systems for low latency... ...architectures and modern LLM techniques Retrieval-...Flexible hours$184k - $287.5k
...engineers to join us and build AI inference systems that serve large-... ...extreme efficiency. You’ll architect and implement high-performance... ...programming, distributed systems, deep learning theories. Knowledgeable... ...building and optimizing LLM inference engines (e.g., vLLM...Senior$224k - $356.5k
...As an AI Storage Platform Architect at NVIDIA, this position will... ...architectures for disaggregated inference (aligned with NVIDIA Dynamo),... ...or equivalent experience). ~ Deep expertise in AI infrastructure... ...disaggregated inference architectures, LLM training pipelines, and...SeniorRemote work$184k - $287.5k
...and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is... ...today! We are looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of...Senior- ...Senior AI Architect Nexxa.ai is building artificial super intelligence for heavy industries —... ...environments. Our mission is to translate deep technical breakthroughs into... ...experience in software engineering, machine learning, data science, or closely related technical...Senior
- A leading technology firm in Santa Clara is seeking a Deep Learning Communication Architect to optimize communication performance in deep learning systems. The role involves designing efficient protocols, collaborating with hardware teams, and developing innovative solutions...Senior
- ...leading technology company in Cupertino is seeking a Senior Software Engineer to develop innovative AI-powered iOS... ...role involves collaborating with ML researchers and architecting scalable software for on-device LLM integration, impacting how users interact with AI daily...Senior
- A leading technology firm is seeking a Senior Deep Learning Computer Architect in Santa Clara, CA. The role involves designing hardware architectures to support advanced AI applications and requires a strong background in computer architecture and deep learning. Ideal candidates...Senior
- d-Matrix inc. in Santa Clara, CA is seeking a skilled individual for FPGA design and verification for AI solutions. The role involves collaborating with teams to meet project specifications and implementing robust hardware and software modules. The ideal candidate has a...Senior
$254k - $349.25k
...are seeking a Principal ML Architect to lead the design and... ...SLMs and advanced machine learning techniques . This role requires deep expertise in model architecture... ...environments Optimize inference systems for low latency,... ...architectures and modern LLM techniques Retrieval‑...Flexible hours$152k - $241.5k
Senior Performance Architect, Nemotron page is loaded## Senior Performance... ...of AI systems through deep model-system-hardware co... ...Decoding, Agentic Pipelines, Inference-time compute scaling,... ...* Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang* A Growth...Senior$152k - $241.5k
...science. Today, NVIDIA’s GPU simulates human intelligence, running deep learning algorithms and acting as the brain of computers, robots and... ...our team!NVIDIA Architecture Modeling group is looking for Architects, Functional Modeling Engineers, and Simulation experts to...Senior- A leading technology company is seeking a Senior System Software Engineer to develop GPU-accelerated AI inference serving software. The ideal candidate will have over 5 years of experience with deep learning software, strong skills in Rust and C++, and a collaborative approach...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Deep Learning Architect, LLM Inference. Be the first to apply!
- senior development executive Santa Clara, CA
- senior technical manager Santa Clara, CA
- senior software development engineer in test Santa Clara, CA
- senior manager data science Santa Clara, CA
- senior platform engineer Santa Clara, CA
- senior procurement Santa Clara, CA
- senior director product management Santa Clara, CA
- senior electronic design engineer Santa Clara, CA
- senior manager customer operations Santa Clara, CA
- senior director information security Santa Clara, CA

