Manager, Large Language Model Inference

$184k - $287.5k

NVIDIA

At NVIDIA, we aren't just powering the AI revolution-we're accelerating it. The TensorRT inference platform is the backbone of modern AI, delivering the industry's fastest and most efficient deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for AI exploding, particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are significantly expanding our team. We're seeking a highly skilled and driven Engineering Manager to take the lead in developing the next generation of LLM/VLM/VLA inference software technologies that will define the future of AI. This is a high-impact, hands-on leadership role at the intersection of deep technical expertise and world-class management. You won't just manage; you'll architect and guide a brilliant team of engineers who are building the core LLM inference runtime. Your work will be highly collaborative, interfacing directly with NVIDIA Researchers, GPU Architects, and other teams across the company to ensure we ship production-grade, lightning-fast software that sets the global standard for AI performance. What You'll Be Doing: Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference. Drive the design, development, and delivery of production inference software, targeting NVIDIA's next-generation enterprise and edge hardware platforms. Integrating cutting-edge technologies developed at NVIDIA and offering an intuitive developer experience for LLM deployment. Lead software development execution, with responsibility for project planning, milestone delivery, and cross-functional coordination. What We Need to See: MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field. 7+ overall years of overall software engineering experience, including 3+ years of technical leadership experience. Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups. Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries. Demonstrated expertise in large language models (LLM) and/or vision language models (VLM). Ways to Stand Out from the Crowd: Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning. Background in LLM inference or working with frameworks such as TensorRT-LLM, vLLM, or SGLang. Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem. Have a proven track record of growing and managing a team that encourages idea sharing, empowers team members, and provides opportunities for professional growth. We are widely considered to be one of the technology world's most desirable employers, and we have some of the most forward-thinking and hardworking people in the world working with us. Due to outstanding growth, our best-in-class teams are rapidly growing. If you're a creative self-starter with a real passion for technology, then come join us. #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 2, and 224,000 USD - 356,500 USD for Level 3. You will also be eligible for equity and benefits . Applications for this job will be accepted at least until November 4, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Corporation

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Manager, Large Language Model Inference in Santa Clara, CA vacancy

Senior Product Marketing Manager, AI Inference
...industry-leading training and inference speeds and allows users to run large-scale ML applications with less hardware management. Cerebras’ customers include top model labs, global enterprises, and... ...environments. Familiarity with large language models, foundation model...
Language
Cerebras
Sunnyvale, CA
2 days ago
Large Language Model Algorithm Engineer
$124.8k - $260.4k
...training, instruction fine-tuning, post-training, training and inference acceleration, evaluation, and more, to maintain a leading... ...and improve the theoretical and engineering systems for large language models in games, exploring the application of LLM technology in game...
Language
Work experience placement
Worldwide
Relocation package
Dormont Manufacturing Co
Palo Alto, CA
2 days ago
Research Scientist - Vision Language Model
$150k
...Institute of Foundation Models We are a dedicated... ...understanding, using, and risk-managing foundation models. Our... ...in the Vision Language Model (VLM) team, your... ...research and development of large-scale VLM systems,... ...model modularity, and inference optimization. Build...
Language
Institute of Foundation Models
Sunnyvale, CA
3 days ago
Senior Deep Learning Engineer - Model Evaluation & AI Systems
$224k - $356.5k
...Principal Deep Learning Engineer — Model Evaluation & AI Systems, you... ...result pipelines running on large GPU clusters. Collaborate... ...Work alongside model training, inference, and product divisions to... ...Hands‑on experience with large language models and NLP, including model...
Language
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Staff Machine Learning Engineer - Foundation Model
$215.28k - $364.32k
...state-of-art ML infrastructure for training very large foundation model and accelerating model training/inference. Our mission is to solve the autonomous driving problem... .... Experience in training large scale vision or language models. Previous experience in the autonomous...
Language
Full time
Dormont Manufacturing Co
Santa Clara, CA
2 days ago
Research Intern - World-Action Foundation Model, Robotics
...flexibility and trust our employees to manage their schedules responsibly.... ...of miles of data from large fleets, and deploy methods they... ...pretraining world‑action foundation model with various world modalities... ..., human data incorporation, language modality, and spatial...
Language
For contractors
For subcontractor
Casual work
Internship
Work at office
Immediate start
Remote work
Day shift
Decisive Point
Sunnyvale, CA
3 days ago
Senior Software Engineer, AI Model Lifecycle
$172.43k - $230.95k
...Software Engineer for the AI Model Lifecycle team will play a... ...in building a comprehensive managed platform for the entire application... ...Learning models, including Large Language Models (LLMs). What You’ll... ...on GPU systems and inference frameworks. Benefits: Competitive...
Language
Temporary work
Crusoe
Sunnyvale, CA
3 days ago
Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model
$181.1k - $318.4k
...Engineer on the Foundation Model Compute Infrastructure... ...systems for large‑scale TPU workloads across... ...distributed systems that manage thousands of accelerators... ...large‑scale training and inference jobs. This role spans... ...C++, or similar systems languages Extensive experience with...
Language
Relocation
Apple
Santa Clara, CA
1 day ago
Member of Technical Staff - Model Training
$175k - $350k
...pioneering this future with human-centered AI models that unite emotional intelligence (EQ)... ...and perspectives. Platform — large-language models (LLMs) and APIs that enable builders... ...aggressive quality targets. Collaborate with inference, safety, and product teams to land...
Language
Inflection AI
Palo Alto, CA
2 days ago
Senior Manager, Interactive World Model Platforms
$272k - $431.25k
...for a new generation of interactive world-model systems. With this release, NVIDIA has set the standard for fidelity, real-time inference performance in world models. And, we've... ...experience, CI/CD, and release management. Partner with research, simulation, rendering...
NVIDIA Corporation
Santa Clara, CA
1 day ago
Senior Manager, Interactive World Model Platforms
...for a new generation of interactive world-model systems. With this release, NVIDIA has set the standard for fidelity, real‑time inference performance in world models. We have... ...developer experience, CI/CD, and release management. Partner with research, simulation, rendering...
NVIDIA Gruppe
Santa Clara, CA
3 days ago
Creative Writing Generative AI Analyst - California
$38 per hour
...perform annotation efforts of multimedia and language data labeling and review work (e.g.,... ...used to train and improve generative AI models. Project Details Job Title: Creative Writing... ...Familiarity with generative AI systems, large language models, RLHF, or multimodal AI...
Language
Full time
Remote work
Sonara Inc.
Santa Clara, CA
2 days ago
Member of Technical Staff - Voice Model
$150k
...You will join the Grok Voice Model team to help build the world’... ...processing, frontier speech-language pre-training, and intensive post... ...: Design and execute large-scale speech data curation and... ...scale distributed training and inference systems on Kubernetes. Proactive...
Language
Temporary work
Pantera Capital
Palo Alto, CA
2 days ago
Knowledge Management Technical Project Manager New College Graduate- Bachelor's (Santa Clara, CA)
...Technical Program Manager Applied Materials' Global Services organization is seeking a... ...programming to cleanse, integrate, and analyze large datasets Develop automated data-... ...or native speaker in one or more of these languages: Japanese, Korean, Simplified Chinese, Traditional...
Language
Full time
Relocation
Applied Materials
Santa Clara, CA
1 day ago
Applied Scientist II, Foundation Model
...interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at...
Language
Worldwide
Califesciences
Sunnyvale, CA
1 day ago
Manager, Next-Gen AI Cluster Validation
$224k - $356.5k
...to support the development of large-scale supercomputing systems... ...development and system automation with languages such as Go, Python, or... ...and multi‑node training and inference workloads Expertise with high... ...track record of growing and managing a team that encourages idea...
Language
Remote work
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Manager, Deep Learning Algorithms
...is seeking an engineering manager to lead engineering... ...productizing Deep Learning models. Academic and commercial... ...background, with exposure to large scale LLM/VLM deployment, inference optimization, and leadership... ...experience with Large Language Models (LLMs) and Large Visual...
Language
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Deal Manager, China Large-Order Ops
Applied Materials, Inc. seeks a Deal Manager for Sales Operations in Santa Clara, CA. This role focuses on managing large order engagements with China customers, ensuring compliance with regional requirements and facilitating strategic customer interactions. The candidate...
Full time
Applied Materials, Inc.
Santa Clara, CA
1 day ago
Senior Deal & Sales Ops Lead: China Large Orders
Applied Materials is seeking a senior contributor to support large and complex customer engagements in the China market. This role involves... ...relationships with various stakeholders. Experience in order management processes and a proficiency in Mandarin are crucial for success...
Local area
Applied Materials
Santa Clara, CA
5 days ago
Full-Time Family Medicine Physician for Large Public Health and Hospital System in Silicon Valley
...Full-Time Family Medicine Physician for Large Public Health and Hospital System in Silicon... ...information. The Health Information Management Systems Society (HIMSS) recognized SCVMC... ...and human services, taking culture and language into account. Physicians who join our Primary...
Language
Full time
Live in
Relocation
Santa Clara Valley Healthcare
San Jose, CA
3 days ago
Senior DL Engineer: Edge Model Optimization & Inference
NVIDIA Gruppe is looking for a skilled professional to enhance the performance of large-scale models through advanced optimization techniques in Santa Clara, California. Candidates should have a strong background in DL model training and deployment, ideally with a PhD...
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Principal Software Engineer - Large-Scale LLM Memory and Storage Systems
$272k - $425.5k
Principal Software Engineer – Large-Scale LLM Memory and... ..., low-latency inference framework for serving generative... ...AI and reasoning models across multi-node distributed... ..., routes requests, and manages shared KV cache across... ...scale. As large language models rapidly outgrow...
Language
Local area
Remote work
NVIDIA Corporation
Santa Clara, CA
4 days ago
Senior Research Engineer, Foundation Model Training Infrastructure
$224k - $356.5k
...building cutting‑edge infrastructure for large‑scale foundation model training in the Generalist Embodied... ..., CUDA programming, and cluster management tools like Kubernetes. Strong programming... ...in Python and a high-performance language such as C++ for efficient system...
Language
Full time
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Director of Sales Account Management (Foundry)
...business acumen. You do not just “manage” a key account—you know their... ...suite and speak their language, whether you are talking yield... ...relationships and negotiating large, complex contracts with multiple... ...understanding of foundry business models, customer buying cycles, and...
Language
Contract work
Local area
Shift work
Synopsys Inc
Sunnyvale, CA
2 days ago
Lead AI Engineer (Vision model customization, VLM)
$197.3k - $225.1k
...Lead AI Engineer (Vision model customization, VLM) Overview At Capital One... ...research scientists, technical program managers, and product managers to deliver AI-... ...including foundation model training, large language model inference, similarity search, guardrails, model...
Language
Full time
Part time
Local area
Capital One
San Jose, CA
4 days ago
Senior AI Model Lifecycle Engineer (LLMs & Fine-Tuning)
Crusoe is seeking a Senior Software Engineer for the AI Model Lifecycle team to manage fine-tuning systems and implement training pipelines for Large Language Models. This role requires expertise in Generative AI and the ability to autonomously lead projects that have...
Language
Crusoe
Sunnyvale, CA
3 days ago
Senior Edge-LLM Real-Time Inference Engineer
...to join their TensorRT Edge-LLM team in Santa Clara, California. The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms. Candidates should have a strong background in...
Language
NVIDIA Gruppe
Santa Clara, CA
1 day ago
ML Engineer - Inference & Model Deployment
...who can help us turn powerful AI and ML models into fast, reliable production systems. You... ...: deploying models, optimizing inference latency and throughput, scaling serving systems... ...production environments. Have experience with large-scale model serving, multi‑GPU inference,...
Relocation package
HiringCafe
Cupertino, CA
3 days ago
Senior Applied Scientist, Delivery Foundation Model at Amazon.com Services LLC Santa Clara, CA
Senior Applied Scientist, Delivery Foundation Model job at Amazon.com Services LLC. Santa... ...on vast amounts of Amazon data and infer at Amazon scale, taking advantage of latest... ...environments using Python, C++ or other languages. Strong publication record at top-tier conferences...
Language
Worldwide
Itlearn360
Santa Clara, CA
3 days ago
Product Marketing Manager - AI Platform Software
$152k - $230k
...looking for a technical product marketing manager who is passionate about AI frameworks... ...that convey the value of training and inference frameworks, such as PyTorch, JAX,... ...expertise - Familiarity with popular large language models like DeepSeek, GPT-OSS, Gemma and Phi...
Language
Work experience placement
NVIDIA Corporation
Santa Clara, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Manager, Large Language Model Inference. Be the first to apply!