Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Inference Performance Engineer

$152k - $241.5k

NVIDIA Gruppe

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. What You Will Be Doing: Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads. Architect distributed inference: Design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs. Establish performance methodology: Apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers. Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data. Technical Leadership: Raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team. What We Need To See: BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience. 5+ years of relevant software development experience. Strong Python or C++ programming, software design, and software engineering skills. Expertise with a DL framework such as PyTorch or JAX. Proven track record of delivering measurable performance improvements in deep learning inference or high-performance systems. Deep understanding of LLM/VLM architectures and inference mechanics: attention, KV caching, batching strategies, decode-phase bottlenecks, speculative decoding, disaggregated serving etc. Ways To Stand Out From The Crowd: Prior experience with an LLM framework (TensorRT-LLM, vLLM, SGLang, etc) or a DL compiler in inference, deployment, algorithms, or implementation. Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application. Experience with scale-out inference orchestration (MPI, NCCL, K8S) on large GPU clusters. Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths (torch.compile, graph lowering, operator fusion). Architectural knowledge of CPU, GPU, FPGA or other DL accelerators; GPU programming experience (CUDA). Track record of leading ambiguous, high-impact technical programs across multiple teams under tight deadlines. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Gruppe

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer in Santa Clara, CA vacancy
  •  ...California is seeking a talented individual to optimize inference engines for local environments, impacting the future of AI. Applicants should have a strong background in...  ...development, with experience in profiling performance issues. The successful candidate will work... 
    Performance
    Local area

    Intel

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...deep learning ignited modern AI — the next era of computing —...  ...seeking top‑tier AI Compiler Engineers to drive innovation within our...  ...of what is possible in AI performance and help build the technology...  ...problems for AI workloads (both inference and training) and... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...leading technology company is seeking a skilled engineer to optimize deep learning frameworks and enhance GPU kernel performance. The ideal candidate excels in collaborative...  ...a focus on innovative solutions and advancing AI technologies. #J-18808-Ljbffr Advanced Micro Devices
    Performance

    Advanced Micro Devices

    Santa Clara, CA
    3 days ago
  • $184k - $356.5k

    A leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge algorithms... 
    Performance
    Full time

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  •  ...seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal...  ...deep learning, specifically in inference. This role involves profiling,...  ...collaborating with teams to advance AI solutions. A strong... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...Systems builds the world's largest AI chip, 56 times larger than...  ...‑leading training and inference speeds and empowers machine learning...  ...Role We are hiring a Senior Performance Analyst to join our Product...  ...Collaborate with Product and Engineering to identify where competitors... 
    Performance
    Contract work
    Shift work

    Cerebras

    Sunnyvale, CA
    2 days ago
  • $170.5k - $315.49k

    ## Inference Optimization Engineer (local / edge runtime)Applylocations: US, California, Santa Clara: US,...  ...MissionAt Intel, our journey is to transform AI into something safer, more...  ...across hardware tiers and publish honest performance comparisons* Upstream fixes and patches... 
    Performance
    Internship
    Local area
    Immediate start
    Shift work

    Intel

    Santa Clara, CA
    17 hours ago
  • $224k - $356.5k

     ...into the unlimited potential of AI to define the next era of...  ...at the forefront of AI and high-performance computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems...  ...Work alongside model training, inference, and product divisions to provide... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $160k - $253k

     ...centers are transforming into AI factories, and NVIDIA accelerated computing is the engine of artificial intelligence. Our...  ...center platforms integrate high performance compute, networking, and a full...  ...performance and efficiency for AI inference & training. What you’ll be... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...design and verification with agentic AI workflows. Our platform...  ...cutting‑edge generative AI to assist engineers in RTL design, simulation, and...  ...Systems Engineer to optimize the performance and efficiency of large language model inference powering our agentic AI platform... 
    Performance

    ScOp Venture Capital

    Santa Clara, CA
    4 days ago
  • A leader in AI technology in Palo Alto is seeking a Senior AI Systems Performance Engineer to optimize the latest foundation models on their innovative platform. This role involves collaborating with cross-functional teams to push the performance limits of AI systems.... 
    Performance

    SambaNova

    Palo Alto, CA
    1 day ago
  • $170.6k - $261.3k

     ...global scale. Our Embodied AI teams are redefining what’s possible...  ...stop. As a Senior Software Engineer on the Secondary Driving...  ...testing, continuous integration, performance profiling, and observability...  ...GPU/accelerator‑based ML inference, model deployment, and performance... 
    Performance
    Remote work
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    17 hours ago
  • $200k

    Velaura is seeking a Senior RTL Engineer to build next-generation Physical AI SoCs. This role involves collaboration with teams to drive microarchitectural decisions and ensure high-performance, power-efficient hardware. The ideal candidate has over 8 years of experience... 
    Performance

    Velaura

    Santa Clara, CA
    4 days ago
  • Apple Inc. is seeking a Software Performance Engineer for its Vision Products Group in Sunnyvale, California. In this role, you'll optimize AR/VR system software for high-performance and low-latency experiences. You will work with a team aiming to push the boundaries of... 
    Performance

    Apple Inc.

    Sunnyvale, CA
    4 days ago
  • $165k - $180k

    Lead Ultrasound Imaging Engineer (AI & Systems) iSono Health is a dynamic and rapidly growing...  ...onward to ensure excellent field performance, high reliability, supply continuity, efficient...  ...platforms (e.g., Jetson) for edge AI inference and image processing acceleration.... 
    Performance

    iSono Health Inc.

    Sunnyvale, CA
    17 hours ago
  • Advanced Micro Devices (AMD) is seeking a skilled engineer to optimize deep learning frameworks for AMD GPUs. You will enhance GPU performance, accelerate deep learning models, and work...  ...an opportunity to significantly impact AI solutions while fostering innovation and... 
    Performance

    Advanced Micro Devices

    Santa Clara, CA
    3 days ago
  • Lemurian Labs in Santa Clara seeks a Runtime Engineer to design and develop a multi-target runtime for their AI compiler stack. This role involves low-level parallelization...  ...with compiler and product teams to enhance performance across diverse hardware. The ideal candidate... 
    Performance

    Lemurian Labs

    Santa Clara, CA
    1 day ago
  • $150k - $250k

    MixMode in Santa Clara is looking for a Senior Staff SI/PI Engineer to ensure the electrical integrity of high-performance AI compute platforms. This role involves driving AI accelerator strategies and leading the simulation efforts for complex chip packages. Candidates... 
    Performance

    MixMode

    Santa Clara, CA
    17 hours ago
  • Intel Corporation is seeking a Senior Compiler Engineer to develop and optimize compiler software for next-generation GPU architectures...  ...on cutting-edge compiler technologies that enhance AI and high-performance computing performance. The ideal candidate will possess a Bachelor... 
    Performance

    Intel

    Santa Clara, CA
    4 days ago
  • $207k - $300k

    Google is seeking an experienced AI/ML Software Engineer to enhance GPU architectures and optimize performance benchmarks. The role involves collaborating with teams to solve ML model challenges and architect transformative AI solutions, contributing to Google's machine... 
    Performance

    Google

    Sunnyvale, CA
    1 day ago
  • NVIDIA Gruppe is looking for a senior engineer to join their Math Libraries team in Santa Clara...  ...involves designing and implementing high-performance numerical linear algebra software on GPUs...  ...opportunity to be part of cutting-edge AI and data center technologies. #J-18808-... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $180k - $260k

     ...As a Senior MEMS Design Engineer at nEye.ai, you will be responsible for the design, simulation, and optimization of MEMS devices that...  ...Engineers to ensure our MEMS structures are not only high‑performing but also robust and reliable for high‑volume manufacturing.... 
    Performance

    nEye Systems, Inc.

    Santa Clara, CA
    4 days ago
  •  ...Corporation is seeking a Senior Systems Software Engineer for their DGX Cloud team in Santa Clara, California. In this role, you will lead performance and scalability analysis of Kubernetes-...  ..., ensuring high efficiency for AI workloads. You will work closely with researchers... 
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $163k - $237k

     ...Inc. is seeking an experienced candidate to shape the future of AI/ML hardware acceleration, focusing on TPU technology that...  ...and verifying power delivery networks, ensuring reliability and performance in advanced designs. Expect a collaborative environment working... 
    Performance

    Google Inc.

    Sunnyvale, CA
    17 hours ago
  • $136k - $218.5k

    NVIDIA in Santa Clara is seeking a Silicon Speed Features Engineer to co-design system-level speed features across...  ...involves collaborating cross-functionally and using AI to enhance automation tools for performance validation. Ideal candidates should have a Master’s degree... 
    Performance

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $147k - $202.5k

     ...Materials is a global leader in materials engineering solutions used to produce virtually every...  ...that literally connect our world – like AI and IoT. If you want to push the boundaries...  .... You will collect and analyze data, perform hardware characterization, and troubleshoot... 
    Performance
    Full time
    Work experience placement
    Relocation

    APPLIED MATERIALS

    Santa Clara, CA
    5 hours ago
  • NVIDIA Gruppe is seeking a Silicon Speed Features Engineer to lead validation and automation infrastructure for silicon issues. You will work across teams to ensure product quality and performance in a dynamic environment. This role requires an MS in EE or equivalent,... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $250k

     ...eGain is the leader in AI knowledge management solutions for enterprises. As organizations...  ...As Director of Site Reliability Engineering, you will ensure that eGain’s AI knowledge...  ...platform operates with the reliability, performance, and resilience that enterprise... 
    Performance
    Work at office

    eGain Corporation

    Sunnyvale, CA
    4 days ago
  • $120k - $180k

     ...security with the world's most advanced AI-native platform. We work on large scale...  ...About the Role You’ll work closely with engineering teams to expand test coverage across unit...  ...foundation that ensures reliability and performance as we deploy AI security controls across... 
    Performance
    Full time
    Contract work
    Work experience placement
    Work at office
    Local area

    CrowdStrike Holdings, Inc.

    Sunnyvale, CA
    2 days ago
  • $184k - $287.5k

    NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact... 

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!