Manager, Large Language Model Inference
$184k - $287.5kNVIDIA
At NVIDIA, we aren't just powering the AI revolution-we're accelerating it. The TensorRT inference platform is the backbone of modern AI, delivering the industry's fastest and most efficient deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for AI exploding, particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are significantly expanding our team. We're seeking a highly skilled and driven Engineering Manager to take the lead in developing the next generation of LLM/VLM/VLA inference software technologies that will define the future of AI. This is a high-impact, hands-on leadership role at the intersection of deep technical expertise and world-class management. You won't just manage; you'll architect and guide a brilliant team of engineers who are building the core LLM inference runtime. Your work will be highly collaborative, interfacing directly with NVIDIA Researchers, GPU Architects, and other teams across the company to ensure we ship production-grade, lightning-fast software that sets the global standard for AI performance. What You'll Be Doing: Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference. Drive the design, development, and delivery of production inference software, targeting NVIDIA's next-generation enterprise and edge hardware platforms. Integrating cutting-edge technologies developed at NVIDIA and offering an intuitive developer experience for LLM deployment. Lead software development execution, with responsibility for project planning, milestone delivery, and cross-functional coordination. What We Need to See: MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field. 7+ overall years of overall software engineering experience, including 3+ years of technical leadership experience. Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups. Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries. Demonstrated expertise in large language models (LLM) and/or vision language models (VLM). Ways to Stand Out from the Crowd: Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning. Background in LLM inference or working with frameworks such as TensorRT-LLM, vLLM, or SGLang. Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem. Have a proven track record of growing and managing a team that encourages idea sharing, empowers team members, and provides opportunities for professional growth. We are widely considered to be one of the technology world's most desirable employers, and we have some of the most forward-thinking and hardworking people in the world working with us. Due to outstanding growth, our best-in-class teams are rapidly growing. If you're a creative self-starter with a real passion for technology, then come join us. #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 2, and 224,000 USD - 356,500 USD for Level 3. You will also be eligible for equity and benefits . Applications for this job will be accepted at least until November 4, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Corporation
$119.8k - $234.7k
...Ourconverged AI fabricdelivers inference capabilities for all LLMs... ...strategy. Our mission is to serve models at scale-reliably,... ...DeepSeek, and others. Build large-scale AI services and platform... ...engineering experience with coding in languages including, but not limited to...LanguageOngoing contractLocal area- ...Job Title: CW Research on Large Vehicle Data Model - Summer Intern (99W210) About Kyyba: Founded in 1998 and headquartered in Farmington... ..., including pretraining and post-training, leveraging language supervision, and enhancing multimodal reasoning...LanguageSummer internshipVisa sponsorshipWork visa
- ...Institute of Foundation Models We are a dedicated... ...understanding, using, and risk-managing foundation models. Our... ...in the Vision Language Model (VLM) team, your... ...research and development of large-scale VLM systems,... ...model modularity, and inference optimization. Build...Language
$212.3k - $275.8k
...and observable AI services, optimizing inference performance from CPU and small GPUs to large multi-GPU servers, including air-gapped and customer-managed deployments. You'll work on... ...optimization, deployment automation, and model/service observability. This role requires...LanguageFull timeTemporary workLocal areaFlexible hours3 days per week$174.72k - $295.68k
...state-of-the-art ML infrastructure to train very large foundation models and accelerate model training and inference. You will work with software engineers, machine... ...Experience in training large-scale vision or language models. Previous experience in the autonomous...LanguageFull time- ...of-the-art ML infrastructure for training very large foundation models and accelerating model training/inference. You will work with a team of software engineers... ...Experience in training large-scale vision or language models. Previous experience in the autonomous...LanguageFull time
$224k - $356.5k
...Principal Deep Learning Engineer — Model Evaluation & AI Systems, you... ...result pipelines running on large GPU clusters. Collaborate... ...Work alongside model training, inference, and product divisions to... ...Hands‑on experience with large language models and NLP, including model...Language$172.43k - $230.95k
...Software Engineer For The Ai Model Lifecycle Team Crusoe is... ...in building a comprehensive managed platform for the entire application... ...Learning models, including Large Language Models (LLMs). What You'... ...on GPU systems and inference frameworks. Benefits...LanguageTemporary work$181.1k - $318.4k
...work closely with product teams and utilize advanced machine learning technologies, contributing directly to optimizing language and vision models. Applicants should have at least 5 years of industry experience in machine learning, be proficient in cloud applications,...Language$181.1k - $318.4k
...Engineer on the Foundation Model Compute Infrastructure... ...systems for large‑scale TPU workloads across... ...distributed systems that manage thousands of accelerators... ...large‑scale training and inference jobs. This role spans... ...C++, or similar systems languages Extensive experience with...LanguageRelocation$175k - $350k
...Model Training Engineer At Inflection AI, our public benefit... ...perspectives. Platform — large-language models (LLMs) and APIs that... ...targets. Collaborate with inference, safety, and product teams to... ...following stages: Hiring Manager Conversation – An initial...LanguageFull time- ...flexibility and trust our employees to manage their schedules responsibly.... ...of miles of data from large fleets, and deploy methods they... ...pretraining world-action foundation model with various world modalities... ..., human data incorporation, language modality, and spatial...LanguageFor contractorsFor subcontractorCasual workInternshipWork at officeImmediate startRemote workDay shift
$13 - $27 per hour
...transcripts and timestamps, and ensure models are only trained on the best... ...work across many languages — and we're growing fast.... ...hiring Transcription Project Managers to each own the transcription... ...deadlines Experience managing large distributed teams, contractors...LanguageContract workFor contractorsFreelanceRemote work$150k
...You will join the Grok Voice Model team to help build the world'... ...processing, frontier speech-language pre-training, and intensive post... ...: Design and execute large-scale speech data curation and... ...scale distributed training and inference systems on Kubernetes. Proactive...LanguageTemporary work$175k - $350k
...Member of Technical Staff – Model Training Inflection AI is... ...leveraging our world class large language model to build the first AI... ...fine-tuning (10M+ examples), inference, and orchestration platform.... ...following stages: Hiring Manager Conversation – An initial discussion...Language$147.4k - $272.1k
...quality user‑centric search and data platform, and the primary inference platform that enable next generation user experiences for... ...Learning Engineer who has a robust understanding of Large Language Models, Generative AI and high-performance systems computing. Your...LanguageRelocation$224k - $356.5k
...to support the development of large-scale supercomputing systems... ...development and system automation with languages such as Go, Python, or... ...and multi‑node training and inference workloads Expertise with high... ...track record of growing and managing a team that encourages idea...LanguageRemote work- ...is seeking an engineering manager to lead engineering... ...productizing Deep Learning models. Academic and commercial... ...background, with exposure to large scale LLM/VLM deployment, inference optimization, and leadership... ...experience with Large Language Models (LLMs) and Large Visual...Language
$126k - $193k
...Large Complex Project EHS Manager The Large Complex Project EHS Manager is responsible for leading and managing all Environmental Health & Safety... ...and Project Leadership. Leadership Competencies Model DPR's core values: Integrity, Enjoyment, Uniqueness, Everforward...For contractorsWork at office$244.8k
...groups dedicated to generative models for content creation, image... ...Multimodal Model Training and Inference Optimization Engineer with... ...scalability, and deployment of large-scale generative AI models. Responsibilities... .... Appropriately handling and managing confidential information...Temporary workLocal area$215.28k - $364.32k
...Staff Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading... ...development of XPENG's next-generation Vision-Language-Action (VLA) Foundation Model — the... ...experts to design, train, and deploy large-scale multi-modal models that unify vision...LanguageFull time$237k - $329k
...data analysis using SQL and scripting languages (e.g., Python). 5 years of experience in technical leadership or people management. Minimum qualifications: Bachelor's... ...Experience applying machine learning/large language models (LLMs) in industry settings. Experience...LanguageFull timeWorldwide- ...V.I.P Hotel Manager – FIFA World Cup 26™ | San Francisco Department: Tournament Time Role | SP Employment Type: Fixed... ...pressure. Understanding of tournament operations and large-scale accommodation management. Languages Fluent in English. Spanish and/or French proficiency...LanguageContract workFixed term contractWork experience placement
- ...to join their TensorRT Edge-LLM team in Santa Clara, California. The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms. Candidates should have a strong background in...Language
$130k - $165k
...Description Job Description KEY ACCOUNT MANAGER Do you want to be part of an... ...networking skills and the ability to navigate large multi-national organizations. This role acts... ..., potting, and two-part mixing ~ Language skills in Mandarin preferred, or Spanish...LanguageWork at officeLocal areaFlexible hours$181.1k - $318.4k
...bring smile to people’s face”. Foundation Model Services team, within Machine Learning... ...work on optimizing billions of parameter language and vision and speech models using state... ...Research team to prototype and develop inference for cutting‑edge model architectures. Build...LanguageRelocation- NVIDIA Gruppe is looking for a skilled professional to enhance the performance of large-scale models through advanced optimization techniques in Santa Clara, California. Candidates should have a strong background in DL model training and deployment, ideally with a PhD...
$224k - $356.5k
...building cutting‑edge infrastructure for large‑scale foundation model training in the Generalist Embodied... ..., CUDA programming, and cluster management tools like Kubernetes. Strong programming... ...in Python and a high-performance language such as C++ for efficient system...LanguageFull time- Advanced Micro Devices in Santa Clara seeks a Senior ML Engineer focused on optimizing large language model inference runtimes. The role involves architecting distributed systems and enhancing performance across GPUs. Ideal candidates will have expertise in Python and...Language
$175k - $296k
...state-of-art ML infrastructure for training very large foundation model and accelerating model training/inference. Our mission is to solve the autonomous driving... ...Experience in training large scale vision or language models Previous experience in the autonomous driving...LanguageFull time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Manager, Large Language Model Inference. Be the first to apply!
- hvac manager Santa Clara, CA
- pharma manager Santa Clara, CA
- translation manager Santa Clara, CA
- remote coding manager Santa Clara, CA
- overnight manager Santa Clara, CA
- infection prevention manager Santa Clara, CA
- manager corporate partnerships Santa Clara, CA
- full time manager Santa Clara, CA
- survey manager Santa Clara, CA
- manager sodexo Santa Clara, CA



