Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Inference Performance Engineer Scale LLMs & GPU Clusters

$124k - $195.5k

NVIDIA

NVIDIA Corporation is seeking an AI Inference Performance Engineer - New College Grad 2026 in Santa Clara. This role involves optimizing AI inference benchmarks using NVIDIA’s accelerators and working with various teams on performance enhancements. Applicants should have a solid background in software engineering and deep learning frameworks. The position offers a competitive salary and benefits, with a range between 124,000 USD - 195,500 USD for Level 2. Join a pioneering team that is shaping the AI landscape. #J-18808-Ljbffr

Vacancy posted 12 hours ago
Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer Scale LLMs & GPU Clusters in Santa Clara, CA vacancy
  • $184k - $356.5k

     ...looking for a Senior Software Engineer specializing in Deep Learning Inference in Santa Clara, California...  ...will design and optimize GPU-accelerated software critical for advanced AI applications, contributing...  .... The role includes performance optimization and collaboration... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    13 hours ago
  •  ...experiences-from AI and data...  ...Staff AI Infra Engineer who is...  ...improving the performance of key applications...  ...and GPU-accelerated computing...  ...Language Models (LLMs) and Agentic...  ...training and inference on AMD GPUs,...  ...on GPU clusters, including large-scale training and... 
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

    NVIDIA is seeking a Senior Systems Software Engineer focusing on GPU Performance at Scale. This role involves driving innovation in AI and GPU computing, collaborating with developers and researchers to enhance system workflows. Key duties include leading performance practices... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • NVIDIA Gruppe is seeking a Software Performance at Scale Intern in Santa Clara, CA,...  ...opportunity to work with a leading engineering team focused on AI and computing. The role includes collaborating...  ...optimizations in large GPU clusters. The ideal intern is currently enrolled... 
    Performance
    Internship

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency...  ...and implement high-performance inference stacks, optimize GPU kernels and compilers,...  ...deployments on GPU clusters across clouds. Conduct... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    13 hours ago
  • $160k - $322k

     ...Santa Clara is seeking a Senior Technical Marketing Engineer focused on GPUs and scale-up architecture. The role involves showcasing NVIDIA's GPU architecture and server-level platforms, aiming to maximize performance for AI applications. The ideal candidate will have at... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    12 hours ago
  • $184k - $356.5k

    NVIDIA Gruppe is looking for skilled software engineers to develop AI inference systems that operate with high efficiency. The role involves architecting high-performance inference frameworks and optimizing GPU processes. Ideal candidates should have extensive programming... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...invention of the GPU 1999 sparked the growth...  ...ignited modern AI — the next era of...  ...-tier AI Compiler Engineers to drive innovation...  ...is possible in AI performance and help build the...  ...on a global scale. What you’ll be...  ...AI workloads (both inference and training) and... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $124k - $195.5k

     ...cutting edge of AI infrastructure...  ..., we need engineers who can model,...  ...level traffic at scale. If you have a passion for performance analysis, a...  ...datacenter and GPU systems.What you...  ..., and clustering techniques such...  ...fundamentals, LLMs, and modern inference serving frameworks... 
    Performance

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $160k - $253k

     ...transforming into AI factories, and NVIDIA...  ...computing is the engine of artificial intelligence...  ...integrate high performance compute, networking...  ...to power AI at scale. We are looking for...  ...showcasing NVIDIA's GPU architecture, server...  ...efficiency for AI inference & training. What you... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    13 hours ago
  • $152k - $241.5k

     ...motivated Software Engineer to join our growing AI and Generative...  ...of large-scale AI systems powering...  ...applications in LLMs, agentic AI, retrieval...  ...ML training, inference, and generative...  ...platforms supporting GPU clusters, fault‑tolerant...  ..., and high‑performance AI workloads. Develop... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    12 hours ago
  •  ...Department: Backend Engineer · Work type: On-...  ...About A rchetype AI Archetype AI is developing...  ...for building performant, scalable, and...  ...into production—at scale, with reliability,...  ...-latency AI model inference and data services....  ...performance across GPU clusters, cloud infrastructure... 
    Performance
    Full time

    Neara

    Palo Alto, CA
    12 hours ago
  • $131k - $175k

     ...Senior Hardware Systems Engineer – AI Rack & Cluster Infrastructure Arista Networks...  ...standards of quality and performance in everything we do. Job...  ...engineers, to deliver rack-scale solutions for the world's...  ...cooling into high-density GPU environments, ensuring performance... 
    Performance
    Remote work
    Flexible hours

    Arista Networks, Inc.

    Santa Clara, CA
    5 days ago
  • $207k - $300k

    Senior Research Engineer, On-Device Inference, Robotics, DeepMind...  ...Language Models (LLMs), including...  ...focused on high-performance inference. Understanding...  ...with AI accelerators (e.g...  ...them to setup large-scale tests and deploy...  ...techniques across GPU, TPU, and CPU architectures... 
    Performance
    Full time

    Google Inc.

    Mountain View, CA
    2 days ago
  •  ...NVIDIA Gruppe seeks a skilled HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for high-performance computing workloads. This role involves collaboration with various teams to ensure effective and reliable cluster performance. Key responsibilities... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    13 hours ago
  • $272k - $431.25k

     ...Principal Rack Scale Systems Infrastructure Engineer NVIDIA has been transforming...  ...potential of AI to define the next era...  ...An era in which our GPU acts as the brains...  ...silicon, or other high-performance computing systems....  ...experience with rack- or cluster-scale systems... 
    Performance
    Shift work

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...experiences-from AI and data centers...  ...Quality Engineer to serve as the...  ...on AMD Instinct™ GPU platforms. You will...  ...framework, workload, performance, stress, stability, scale-out, and system-...  ...training and inference (PyTorch, vLLM,...  ...and large-scale cluster software ~ System... 
    Performance
    Contract work
    Shift work

    Advanced Micro Devices , Inc.

    San Jose, CA
    5 days ago
  • $152k - $241.5k

     ...NVIDIA's invention of the GPU 1999 sparked the growth...  ...ignited modern AI — the next era of computing...  ...Deep Learning Compiler Engineer. NVIDIA is hiring software...  ...backbone of NVIDIA’s inference engine, spanning across...  ...deliver leading inference performance, fast build time,... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...Machine Learning Engineer - Inference Serving Frameworks...  ...building rack-level AI inference systems....  ...for data center-scale inference serving....  ...inference serving and cluster scheduling...  ...to architect high‑performance inference stacks and...  ...‑level debugging. GPU kernel development... 
    Performance
    Full time

    Acceler8 Talent

    Santa Clara, CA
    3 days ago
  •  ...located in Santa Clara, CA, is seeking a Senior Systems Software Engineer focused on GPU Performance at Scale. This role entails leading performance practices in large-scale GPU infrastructure and aligning AI workloads with next-generation datacenter builds. The ideal... 
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  •  ...seeking a technical leader for the GPU AI/HPC Infrastructure team. You will...  ...cutting-edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will...  ...+ years of experience with large-scale infrastructure, strong programming... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $224k - $356.5k

     ...unlimited potential of AI to define the...  ...in which our GPU acts as the...  ...of AI and high-performance computing. As a...  ...Deep Learning Engineer — Model Evaluation...  ..., including LLMs, RAG systems, agents...  ...on large GPU clusters. Collaborate...  ...model training, inference, and product divisions... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    13 hours ago
  • $20 - $71 per hour

    NVIDIA is seeking a Software Performance at Scale Intern in Santa Clara, CA. In this role, you will collaborate with engineers to improve software performance across large GPU clusters and analyze workloads to identify optimization opportunities. Candidates should be enrolled... 
    Performance
    Hourly pay
    Internship

    NVIDIA

    Santa Clara, CA
    12 hours ago
  • $152k - $241.5k

    NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $200k - $322k

     ...self‑motivated senior engineer for the Aerial Omniverse...  ...will design and implement GPU kernels that apply time...  ...and NIC budgets at scale. You will work with the...  ...need to see: PhD in high‑performance computing, computer...  ...existing vacancy. NVIDIA uses AI tools in its recruiting... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $20 - $71 per hour

    NVIDIA Corporation is seeking a Software Performance at Scale Intern in Santa Clara, CA. This role involves working with engineering teams to optimize software performance on large GPU clusters. Candidates should be enrolled in a relevant degree program and have strong... 
    Performance
    Hourly pay
    Internship

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  •  ...computing experiences-from AI and data centers, to PCs, gaming...  ...for a strategic software engineering lead who is passionate about improving the performance of key applications and...  ...techniques for optimizing scale-up and scale-out inference. Develop methods and tooling... 
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    1 day ago
  • $120.1k - $225.7k

     ...End-to-End Inference Optimization: Lead...  ...and implement high-performance inference frameworks...  ...to build a robust AI inference technical...  ...Science, Electronic Engineering, AI, or related fields...  ...ultra-large-scale models is highly...  ...large-scale inference clusters or driving AI... 
    Performance
    Relocation package

    Tencent

    Palo Alto, CA
    2 days ago
  •  ...potential of generative AI to power the...  ...The role: Analog Design Engineer, Senior / Staff /Sr. Staff...  ...Artificial Intelligence Inference Accelerator and High-Speed...  ...Die-2-Die Interface for scale-out. Job scope includes...  ...circuit design, system level performance analysis, design test... 
    Performance
    3 days per week

    d-Matrix

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...for inventing the GPU and driving breakthroughs...  ...graphics, high-performance computing, and...  ...everything from generative AI to autonomous...  ...MARS), builds and scales the infrastructure...  ...researchers and engineers to develop the...  ...groundbreaking GPU compute clusters that run demanding... 
    Performance
    Work experience placement

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer Scale LLMs & GPU Clusters. Be the first to apply!