Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Manager, Large Language Model Inference

$184k - $287.5k

NVIDIA

At NVIDIA, we aren't just powering the AI revolution-we're accelerating it. The TensorRT inference platform is the backbone of modern AI, delivering the industry's fastest and most efficient deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for AI exploding, particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are significantly expanding our team. We're seeking a highly skilled and driven Engineering Manager to take the lead in developing the next generation of LLM/VLM/VLA inference software technologies that will define the future of AI. This is a high-impact, hands-on leadership role at the intersection of deep technical expertise and world-class management. You won't just manage; you'll architect and guide a brilliant team of engineers who are building the core LLM inference runtime. Your work will be highly collaborative, interfacing directly with NVIDIA Researchers, GPU Architects, and other teams across the company to ensure we ship production-grade, lightning-fast software that sets the global standard for AI performance. What You'll Be Doing: Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference. Drive the design, development, and delivery of production inference software, targeting NVIDIA's next-generation enterprise and edge hardware platforms. Integrating cutting-edge technologies developed at NVIDIA and offering an intuitive developer experience for LLM deployment. Lead software development execution, with responsibility for project planning, milestone delivery, and cross-functional coordination. What We Need to See: MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field. 7+ overall years of overall software engineering experience, including 3+ years of technical leadership experience. Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups. Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries. Demonstrated expertise in large language models (LLM) and/or vision language models (VLM). Ways to Stand Out from the Crowd: Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning. Background in LLM inference or working with frameworks such as TensorRT-LLM, vLLM, or SGLang. Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem. Have a proven track record of growing and managing a team that encourages idea sharing, empowers team members, and provides opportunities for professional growth. We are widely considered to be one of the technology world's most desirable employers, and we have some of the most forward-thinking and hardworking people in the world working with us. Due to outstanding growth, our best-in-class teams are rapidly growing. If you're a creative self-starter with a real passion for technology, then come join us. #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 2, and 224,000 USD - 356,500 USD for Level 3. You will also be eligible for equity and benefits . Applications for this job will be accepted at least until November 4, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Corporation

Vacancy posted 11 hours ago
Similar jobs that could be interesting for youBased on the Manager, Large Language Model Inference in Santa Clara, CA vacancy
  • $119.8k - $234.7k

     ...Ourconverged AI fabricdelivers inference capabilities for all LLMs...  ...strategy. Our mission is to serve models at scale-reliably,...  ...DeepSeek, and others. Build large-scale AI services and platform...  ...engineering experience with coding in languages including, but not limited to... 
    Language
    Ongoing contract
    Local area

    Microsoft Corporation

    Mountain View, CA
    4 days ago
  •  ...Job Title: CW Research on Large Vehicle Data Model - Summer Intern (99W210) About Kyyba: Founded in 1998 and headquartered in Farmington...  ..., including pretraining and post-training, leveraging language supervision, and enhancing multimodal reasoning... 
    Language
    Summer internship
    Visa sponsorship
    Work visa

    Kyyba

    Mountain View, CA
    5 days ago
  •  ...Institute of Foundation Models We are a dedicated...  ...understanding, using, and risk-managing foundation models. Our...  ...in the Vision Language Model (VLM) team, your...  ...research and development of large-scale VLM systems,...  ...model modularity, and inference optimization. Build... 
    Language

    Institute of Foundation Models

    Sunnyvale, CA
    14 days ago
  • $212.3k - $275.8k

     ...and observable AI services, optimizing inference performance from CPU and small GPUs to large multi-GPU servers, including air-gapped and customer-managed deployments. You'll work on...  ...optimization, deployment automation, and model/service observability. This role requires... 
    Language
    Full time
    Temporary work
    Local area
    Flexible hours
    3 days per week

    Cisco

    San Jose, CA
    5 days ago
  • $174.72k - $295.68k

     ...state-of-the-art ML infrastructure to train very large foundation models and accelerate model training and inference. You will work with software engineers, machine...  ...Experience in training large-scale vision or language models. Previous experience in the autonomous... 
    Language
    Full time

    XPENG & Volkswagen Group

    Santa Clara, CA
    3 days ago
  •  ...of-the-art ML infrastructure for training very large foundation models and accelerating model training/inference. You will work with a team of software engineers...  ...Experience in training large-scale vision or language models. Previous experience in the autonomous... 
    Language
    Full time

    XPENG & Volkswagen Group

    Santa Clara, CA
    3 days ago
  • $224k - $356.5k

     ...Principal Deep Learning Engineer — Model Evaluation & AI Systems, you...  ...result pipelines running on large GPU clusters. Collaborate...  ...Work alongside model training, inference, and product divisions to...  ...Hands‑on experience with large language models and NLP, including model... 
    Language

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $172.43k - $230.95k

     ...Software Engineer For The Ai Model Lifecycle Team Crusoe is...  ...in building a comprehensive managed platform for the entire application...  ...Learning models, including Large Language Models (LLMs). What You'...  ...on GPU systems and inference frameworks. Benefits... 
    Language
    Temporary work

    Crusoe

    Sunnyvale, CA
    2 days ago
  • $181.1k - $318.4k

     ...work closely with product teams and utilize advanced machine learning technologies, contributing directly to optimizing language and vision models. Applicants should have at least 5 years of industry experience in machine learning, be proficient in cloud applications,... 
    Language

    Apple Inc.

    Santa Clara, CA
    4 days ago
  • $181.1k - $318.4k

     ...Engineer on the Foundation Model Compute Infrastructure...  ...systems for large‑scale TPU workloads across...  ...distributed systems that manage thousands of accelerators...  ...large‑scale training and inference jobs. This role spans...  ...C++, or similar systems languages Extensive experience with... 
    Language
    Relocation

    Apple Inc.

    Santa Clara, CA
    2 days ago
  • $175k - $350k

     ...Model Training Engineer At Inflection AI, our public benefit...  ...perspectives. Platform — large-language models (LLMs) and APIs that...  ...targets. Collaborate with inference, safety, and product teams to...  ...following stages: Hiring Manager Conversation – An initial... 
    Language
    Full time

    Humanx

    Palo Alto, CA
    1 day ago
  •  ...flexibility and trust our employees to manage their schedules responsibly....  ...of miles of data from large fleets, and deploy methods they...  ...pretraining world-action foundation model with various world modalities...  ..., human data incorporation, language modality, and spatial... 
    Language
    For contractors
    For subcontractor
    Casual work
    Internship
    Work at office
    Immediate start
    Remote work
    Day shift

    Applied Intuition

    Sunnyvale, CA
    3 days ago
  • $13 - $27 per hour

     ...transcripts and timestamps, and ensure models are only trained on the best...  ...work across many languages — and we're growing fast....  ...hiring Transcription Project Managers to each own the transcription...  ...deadlines Experience managing large distributed teams, contractors... 
    Language
    Contract work
    For contractors
    Freelance
    Remote work

    Align Turn

    Sunnyvale, CA
    1 day ago
  • $150k

     ...You will join the Grok Voice Model team to help build the world'...  ...processing, frontier speech-language pre-training, and intensive post...  ...: Design and execute large-scale speech data curation and...  ...scale distributed training and inference systems on Kubernetes. Proactive... 
    Language
    Temporary work

    xAI

    Palo Alto, CA
    a month ago
  • $175k - $350k

     ...Member of Technical Staff – Model Training Inflection AI is...  ...leveraging our world class large language model to build the first AI...  ...fine-tuning (10M+ examples), inference, and orchestration platform....  ...following stages: Hiring Manager Conversation – An initial discussion... 
    Language

    Inflection AI, Inc

    Palo Alto, CA
    3 days ago
  • $147.4k - $272.1k

     ...quality user‑centric search and data platform, and the primary inference platform that enable next generation user experiences for...  ...Learning Engineer who has a robust understanding of Large Language Models, Generative AI and high-performance systems computing. Your... 
    Language
    Relocation

    Apple Inc.

    Cupertino, CA
    3 days ago
  • $224k - $356.5k

     ...to support the development of large-scale supercomputing systems...  ...development and system automation with languages such as Go, Python, or...  ...and multi‑node training and inference workloads Expertise with high...  ...track record of growing and managing a team that encourages idea... 
    Language
    Remote work

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...is seeking an engineering manager to lead engineering...  ...productizing Deep Learning models. Academic and commercial...  ...background, with exposure to large scale LLM/VLM deployment, inference optimization, and leadership...  ...experience with Large Language Models (LLMs) and Large Visual... 
    Language

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $126k - $193k

     ...Large Complex Project EHS Manager The Large Complex Project EHS Manager is responsible for leading and managing all Environmental Health & Safety...  ...and Project Leadership. Leadership Competencies Model DPR's core values: Integrity, Enjoyment, Uniqueness, Everforward... 
    For contractors
    Work at office

    DPR Construction

    Santa Clara, CA
    2 days ago
  • $244.8k

     ...groups dedicated to generative models for content creation, image...  ...Multimodal Model Training and Inference Optimization Engineer with...  ...scalability, and deployment of large-scale generative AI models. Responsibilities...  .... Appropriately handling and managing confidential information... 
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    4 days ago
  • $215.28k - $364.32k

     ...Staff Machine Learning Engineer - Foundation Model Santa Clara, CA XPENG is a leading...  ...development of XPENG's next-generation Vision-Language-Action (VLA) Foundation Model — the...  ...experts to design, train, and deploy large-scale multi-modal models that unify vision... 
    Language
    Full time

    XPENG

    Santa Clara, CA
    5 days ago
  • $237k - $329k

     ...data analysis using SQL and scripting languages (e.g., Python). 5 years of experience in technical leadership or people management. Minimum qualifications: Bachelor's...  ...Experience applying machine learning/large language models (LLMs) in industry settings. Experience... 
    Language
    Full time
    Worldwide

    Google

    Sunnyvale, CA
    2 days ago
  •  ...V.I.P Hotel Manager – FIFA World Cup 26™ | San Francisco Department: Tournament Time Role | SP Employment Type: Fixed...  ...pressure. Understanding of tournament operations and large-scale accommodation management. Languages Fluent in English. Spanish and/or French proficiency... 
    Language
    Contract work
    Fixed term contract
    Work experience placement

    Fédération Internationale de Football Association

    Santa Clara, CA
    3 days ago
  •  ...to join their TensorRT Edge-LLM team in Santa Clara, California. The role involves developing a state-of-the-art inference framework for large language models and optimizing it for real-time performance on embedded platforms. Candidates should have a strong background in... 
    Language

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $130k - $165k

     ...Description Job Description KEY ACCOUNT MANAGER Do you want to be part of an...  ...networking skills and the ability to navigate large multi-national organizations. This role acts...  ..., potting, and two-part mixing ~ Language skills in Mandarin preferred, or Spanish... 
    Language
    Work at office
    Local area
    Flexible hours

    Mycronic USA

    Sunnyvale, CA
    20 days ago
  • $181.1k - $318.4k

     ...bring smile to people’s face”. Foundation Model Services team, within Machine Learning...  ...work on optimizing billions of parameter language and vision and speech models using state...  ...Research team to prototype and develop inference for cutting‑edge model architectures. Build... 
    Language
    Relocation

    Apple Inc.

    Santa Clara, CA
    4 days ago
  • NVIDIA Gruppe is looking for a skilled professional to enhance the performance of large-scale models through advanced optimization techniques in Santa Clara, California. Candidates should have a strong background in DL model training and deployment, ideally with a PhD... 

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $224k - $356.5k

     ...building cutting‑edge infrastructure for large‑scale foundation model training in the Generalist Embodied...  ..., CUDA programming, and cluster management tools like Kubernetes. Strong programming...  ...in Python and a high-performance language such as C++ for efficient system... 
    Language
    Full time

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • Advanced Micro Devices in Santa Clara seeks a Senior ML Engineer focused on optimizing large language model inference runtimes. The role involves architecting distributed systems and enhancing performance across GPUs. Ideal candidates will have expertise in Python and... 
    Language

    Advanced Micro Devices

    Santa Clara, CA
    1 day ago
  • $175k - $296k

     ...state-of-art ML infrastructure for training very large foundation model and accelerating model training/inference. Our mission is to solve the autonomous driving...  ...Experience in training large scale vision or language models Previous experience in the autonomous driving... 
    Language
    Full time

    XPeng Motors

    Santa Clara, CA
    more than 2 months ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Manager, Large Language Model Inference. Be the first to apply!