Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal High-Performance LLM Training Engineer

$272k - $431.25k

Dormont Manufacturing Co

NVIDIA is seeking a Principal Engineer to drive the performance of large-scale AI training and post-training workloads across NVIDIA’s full hardware and software stack. This role sits at the intersection of distributed training, GPU architecture, systems software, deep learning frameworks, and performance engineering. You will analyze and optimize frontier-scale LLM workloads running on thousands of GPUs, drive improvements across frameworks such as PyTorch, JAX, NeMo, and NeMo RL, and use insights from real workloads to help shape future NVIDIA GPU, system, and software roadmaps. We are looking for a deeply technical leader who can operate across abstraction layers: from application-level training behavior to framework/runtime internals, CUDA libraries, communication collectives, memory systems, networking, and GPU architecture. At this level, success means both directly improving performance directly as well as setting technical direction, raising the bar for the organization, and influencing multi‑functional decisions across NVIDIA. What you will be doing: Lead end-to-end performance analysis and optimization of innovative LLM pre‑training and post‑training workloads on the latest NVIDIA hardware and software platforms. Drive workloads closer to speed‑of‑light performance by identifying and removing bottlenecks across compute, memory, communication, scheduling, parallelism strategy, kernel efficiency, framework overhead, and system‑level scaling. Develop production‑quality software, tools, models, benchmarks, and analysis infrastructure that improve training performance, efficiency, and developer velocity across NVIDIA’s AI software stack. Build and refine performance models, workload characterizations, and simulation methodologies to guide future GPU, networking, system, and software architecture decisions. Serve as a technical authority for AI training performance, partnering closely with teams across GPU architecture, systems, CUDA libraries, compilers, networking, frameworks, product management, and applied AI. Translate workload insights into concrete hardware and software recommendations, and advocate for changes that improve performance and efficiency across the AI ecosystem. Mentor and provide technical leadership to engineers across the organization, helping establish best practices for large‑scale AI performance analysis and optimization. What we need to see: A MS, or PhD (or equivalent experience) in Computer Science, Electrical Engineering, Computer Engineering, or a related field, with 12+ years of relevant work or research experience. Demonstrated principal‑level technical impact in one or more of the following areas: large‑scale AI training systems, GPU performance optimization, distributed systems, high‑performance computing, ML frameworks, compilers/runtimes, or hardware/software co‑design. Deep hands‑on experience analyzing and optimizing performance of large‑scale deep learning workloads, especially transformer‑based models, LLM pre‑training, reinforcement learning, fine‑tuning, or other post‑training workloads. Strong understanding of GPU and AI accelerator architecture from individual accelerators to datacenter‑scale systems. Experience with distributed training techniques such as data parallelism, tensor parallelism, pipeline parallelism, expert parallelism, sequence parallelism, activation checkpointing, mixed precision training, and communication/computation overlap. A strong track record of using profiling, tracing, benchmarking, and performance modeling tools to diagnose complex bottlenecks and drive measurable improvements. Excellent communication and technical leadership skills, with the ability to influence architecture and software decisions across multiple teams without relying on direct authority. GPU computing is the most productive and pervasive platform for deep learning and AI. It begins with the most advanced GPUs and the systems and software we build on top of them. We integrate and optimize every deep learning framework. We work with the major systems companies and every major cloud service provider to make GPUs available in data centers and in the cloud. We craft computers and software to bring AI to edge devices, such as self‑driving cars and autonomous robots. AI has the potential to spur a wave of social progress unmatched since the industrial revolution. This opportunity offers you the ability to collaborate with some of the most forward‑thinking and hard‑working people in the world, shaping the future of AI in a creative and autonomous work environment that encourages innovation. If you’re passionate about working across the full hardware & software stack—from GPU architecture to application code—to achieve optimal performance, we want to hear from you! Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until May 2, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr Dormont Manufacturing Co

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Principal High-Performance LLM Training Engineer in California, MO vacancy
  • Dormont Manufacturing Co is seeking a Senior High-Performance LLM Training Engineer. This role focuses on optimizing NVIDIA’s high-performance LLM software stack using frameworks like PyTorch and JAX for training on thousands of GPUs. The ideal candidate will have a strong... 
    Training
    Performance

    Dormont Manufacturing Co

    California, MO
    5 days ago
  • $184k - $287.5k

    We are now looking for a Senior High-Performance LLM Training Engineer! NVIDIA is seeking experienced engineers specializing in performance analysis and optimization to improve the efficiency of LLM training workloads, which are shaping the world’s most advanced computing... 
    Training
    Performance
    Work experience placement

    Dormont Manufacturing Co

    California, MO
    4 days ago
  • $161.3k - $260k

     ...The role Design Verification Engineer, Principal What you will do Work on the company’s first highly programmable in‑memory computing...  ..., experience, education, and training. We also offer incentive...  ...based on individual and company performance. This is in addition to our... 
    Principal
    Training
    Performance
    3 days per week

    Dormont Manufacturing Co

    California, MO
    1 day ago
  • $129.75k - $190.3k

     ...Audio continues to set new performance standards through advancements...  ...and design oversight of high-performance, high-quality audio...  ...through production. As a Principal Hardware Engineer within the Luxury Audio...  ...limitation, skill set, experience, training, location, and business... 
    Principal
    Training
    Performance
    Temporary work
    Work experience placement
    Work at office
    Flexible hours

    HARMAN

    California, MO
    1 day ago
  • $272k - $431.25k

     ...interconnects. This Principal Architect role leads...  ...SGLang, and TensorRT‑LLM. Publishing findings...  ...and mentoring senior engineers across the organization...  ...with deep expertise in high‑performance networking (InfiniBand...  ...parallelism, or distributed training and inference patterns... 
    Principal
    Training
    Performance

    Dormont Manufacturing Co

    California, MO
    1 day ago
  • $272k - $431.25k

     ...Machine Learning (ML) Engineer to join the GPU accelerated...  ..., SQL, and ML/DL model training and inference pipelines...  ...learning solutions for performance prediction and...  ...implementing, and productionizing high-quality ML/DL solutions...  ..., including LLM/GenAI, reinforcement learning... 
    Principal
    Training
    Performance

    Dormont Manufacturing Co

    California, MO
    1 day ago
  • $220k - $300k

     ...Security Architect (Senior Principal) d-Matrix is seeking an outstanding...  ...principles into our high performance AI accelerator systems that...  ...working with the engineering teams to incorporate them at...  ...experience, education, and training. We also offer incentive opportunities... 
    Principal
    Training
    Performance
    Remote work

    Dormont Manufacturing Co

    California, MO
    1 day ago
  • $272k - $431.25k

     ...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware...  ...Monitor and optimize the performance of our infrastructure ensuring high availability, scalability,...  ...improving substantial distributed training operations using PyTorch (... 
    Principal
    Training
    Performance

    Dormont Manufacturing Co

    California, MO
    20 hours ago
  • $272k - $431.25k

     ...on the world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure Engineer, you will build and guide the development...  ...plane development. Make high-quality technical decisions in...  ...FPGAs, custom silicon, or other high‑performance computing systems. Expertise... 
    Principal
    Performance
    Shift work

    Dormont Manufacturing Co

    California, MO
    1 day ago
  • $152k - $241.5k

     ...in Artificial Intelligence, High Performance Computing and Visualization....  ...for a motivated Deep Learning engineer to bring advanced...  ...stacks, including PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc. You...  ...multi-GPU demands, ranging from training on scales up to 100K GPUs to... 
    Training
    Performance

    Dormont Manufacturing Company

    California, MO
    5 days ago
  •  ...Solutions is expanding with a Principal Data Scientist to lead...  .... This role seeks a highly advanced and versatile...  ...software engineering effort; the overall goal...  ...principles behind model training, validation and hyperparameter...  ...and Company performance. This position also... 
    Principal
    Training
    Performance
    Local area
    Relocation package

    SwiftCruit

    California, MO
    3 days ago
  •  ...Join a well-established, employee-owned engineering consulting environment specializing in...  ...modeling tools to evaluate system capacity, performance, and future needs Analyze flow data,...  ...mentoring junior engineers is highly valued Willingness to travel occasionally... 
    Principal
    Performance
    Permanent employment

    Talisman

    California, MO
    2 days ago
  • $119k - $141k

     ...Role As a Senior Manual QA Engineer, you will be a high-impact individual contributor...  ...-native architectures, and LLM-augmented workflows, you...  ...determine their impact on platform performance, scalability, and...  ...and relevant education or training. Equity In addition to this... 
    Training
    Performance
    Temporary work
    For contractors
    Work at office
    Immediate start
    Remote work
    Flexible hours
    Shift work

    Faro Health Inc.

    California, MO
    5 days ago
  • $75k - $100k

    We are looking to fill a Principal Scientist I or II - Liquid Chromatography...  ...offered, skills, education, training, experience and professional...  ...Checks instrument performance daily, and performs routine upkeep...  ...Required Skills And Experience High Performance Liquid... 
    Principal
    Training
    Performance
    Full time

    Parexel

    California, MO
    5 days ago
  • $200k - $322k

     ...NVIDIA is seeking a Principal Failure Analysis Engineer to lead Silicon Failure Analysis...  ...responsible for enabling a high-availability, safe, and...  ...access controls, and enforcing training, certification, safety,...  ...to resolve product yield, performance, reliability, and quality... 
    Principal
    Training
    Performance

    Dormont Manufacturing Company

    California, MO
    10 hours ago
  • $110.6k - $205.5k

     ...bioinformatics, we are developing differentiated, highly integrated end-to-end solutions for...  ...reviews. Present progress, data, and engineering decisions clearly to stakeholders. This...  ...based on individual and Company performance. This position also qualifies for the benefits... 
    Principal
    Performance
    Local area
    Relocation package

    F. Hoffmann-La Roche AG

    California, MO
    3 days ago
  • $150k - $300k

     ...hiring a Director, Systems NPI Engineering to lead the team that takes...  ...readiness. ~ Experience delivering high‑power, high‑density systems (...  ..., experience, education, and training. We also offer incentive...  ...on individual and company performance. This is in addition to our diverse... 
    Training
    Performance
    3 days per week

    Dormont Manufacturing Co

    California, MO
    1 day ago
  •  ...academic probationMaintain list of high honor/honor studentsSchool...  ...based on records of performance evaluationAdministration and...  ...ManagementOversee school operations in Principal’s absenceAssist in scheduling...  ...orientation and in-service training throughout the yearRegularly... 
    Principal
    Training
    Performance
    Full time
    Part time
    Summer work
    Work at office

    Magnolia-Public-Schools-1

    California, MO
    4 days ago
  • Principal Cloud and Production Operations Engineer serves as the senior technical authority responsible for architecting...  ...on-prem data centers, ensuring high availability and cost efficiency....  ...operations, ensuring uptime, performance, and reliability of customer-facing... 
    Principal
    Performance

    Qode

    California, MO
    5 days ago
  •  ...**:Under the direction of the Principal, the Assistant Principal assists the Principal in performing management and instructional duties...  ...development, and other trainings working through the Principal...  ...Senior Management.* Maintain high level of visibility/availability... 
    Principal
    Training
    Performance

    RippleWorks

    California, MO
    1 day ago
  •  ...are recognized globally for innovation, performance and quality. Sandisk has two facilities...  ...design, including component selection, high-speed signal routing, power delivery, and...  ...technical guidance and mentorship to junior engineers, and contribute to best practices for... 
    Principal
    Performance
    Temporary work
    Remote work
    Flexible hours
    Shift work

    6AM City

    California, MO
    1 day ago
  •  ...Senior Systems Engineer - Higher Ed & SLED - PacNW Sales California...  ...real-time data analysis and AI training and inference. Designed from...  ...and vision. Ability to perform in an unstructured environment...  ...Knowledge of virtualization, high-performance networking and enterprise... 
    Training
    Performance
    Full time
    Traineeship

    Drive Capital

    California, MO
    21 hours ago
  •  ...is to deliver safe, sustainable, and high-performance drilling solutions that reduce risks and...  ...in career growth through continuous training and development, encourage balance with...  ...managing automation, IIOT, and electrical engineering projects Lead engineering deliverables... 
    Training
    Performance

    Precision Drilling

    California, MO
    21 hours ago
  • $100k - $115k

     ...Financial Planning and Analysis, the Principal Analyst provides strategic...  ...and forecasted financial performance. Leads development and...  ...divisional priorities. Provides training and guidance to unit budget...  ...skills to multi-task in a high volume environment. Ability... 
    Principal
    Training
    Performance
    Permanent employment
    Full time
    For contractors
    Work at office
    Remote work
    Work from home
    Shift work

    University of California, Santa Cruz

    California, MO
    4 days ago
  •  ...alternative application process. Principal Data Platform Engineer Full Time Regular...  ...feature engineering, model training, and inference through well...  ...term platform scalability, performance, and maintainability...  ...and Power BI experience is highly valued. Proven ability to... 
    Principal
    Training
    Performance
    Full time
    Immediate start

    MedRisk LLC

    California, MO
    5 days ago
  • $172.36k - $258.55k

     ...computer vision — for high-stakes Medicare...  ..., establish engineering standards for fault...  ...that support model training, validation, and real...  ...consistent model performance across a regulated...  ...Architect and implement LLM-powered and...  ...informal technical lead, principal engineer, or... 
    Training
    Performance
    Work experience placement

    E2E Alignment Healthcare USA, LLC

    California, MO
    2 days ago
  •  ...unified platform that combines high-quality robotic hardware with...  ...Develop and optimize RL training pipelines in both simulation...  ...environments Collaborate with robotics engineers to integrate RL models into...  ...and improve algorithm performance #J-18808-Ljbffr Dormont Manufacturing... 
    Training
    Performance

    Dormont Manufacturing Co

    California, MO
    1 day ago
  • $136k - $230k

    Senior Principal Electrical EngineerSkip to main content* You may choose...  ...#Senior Principal Electrical Engineer page is loaded## Senior...  ...the Role**MiniMed is seeking a highly experienced Senior Principal...  ...embedded systems* Optimize sensor performance parameters including... 
    Principal
    Performance
    Start working today
    Live in
    Work at office
    Local area
    Flexible hours

    MiniMed Group

    California, MO
    1 day ago
  •  ...collaborative, solutions-oriented engineer with strong experience in...  ...passion for improving system performance for the position of Senior...  ...Yard, this position provides high-level engineering support across...  .... Typical Tasks Selects, trains, supervises, and assigns work... 
    Training
    Performance
    For contractors
    Local area

    Santa Clara Valley Transportation Authority

    California, MO
    21 hours ago
  • Principal Procurement Analyst/Associate Procurement Director, Category...  ...the tools, processes, training, and analytics that enable category...  ...frameworks for category performance, supplier engagement, and value...  ...management and operations skills* High attention to detail and... 
    Principal
    Training
    Performance

    United Therapeutics Corporation

    California, MO
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal High-Performance LLM Training Engineer. Be the first to apply!