AI Inference Performance Engineer
$152k - $241.5kNVIDIA Gruppe
We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. What You Will Be Doing: Drive industry benchmark results: own the end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads. Architect distributed inference: Design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs. Establish performance methodology: Apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers. Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data. Technical Leadership: Raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team. What We Need To See: BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience. 5+ years of relevant software development experience. Strong Python or C++ programming, software design, and software engineering skills. Expertise with a DL framework such as PyTorch or JAX. Proven track record of delivering measurable performance improvements in deep learning inference or high-performance systems. Deep understanding of LLM/VLM architectures and inference mechanics: attention, KV caching, batching strategies, decode-phase bottlenecks, speculative decoding, disaggregated serving etc. Ways To Stand Out From The Crowd: Prior experience with an LLM framework (TensorRT-LLM, vLLM, SGLang, etc) or a DL compiler in inference, deployment, algorithms, or implementation. Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application. Experience with scale-out inference orchestration (MPI, NCCL, K8S) on large GPU clusters. Expertise in kernel development (CUTLASS, cuteDSL, tilelang, OpenAI Triton) or compiler/runtime paths (torch.compile, graph lowering, operator fusion). Architectural knowledge of CPU, GPU, FPGA or other DL accelerators; GPU programming experience (CUDA). Track record of leading ambiguous, high-impact technical programs across multiple teams under tight deadlines. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Gruppe
- ...California is seeking a talented individual to optimize inference engines for local environments, impacting the future of AI. Applicants should have a strong background in... ...development, with experience in profiling performance issues. The successful candidate will work...PerformanceLocal area
$152k - $241.5k
...deep learning ignited modern AI — the next era of computing —... ...seeking top‑tier AI Compiler Engineers to drive innovation within our... ...of what is possible in AI performance and help build the technology... ...problems for AI workloads (both inference and training) and...Performance- ...leading technology company is seeking a skilled engineer to optimize deep learning frameworks and enhance GPU kernel performance. The ideal candidate excels in collaborative... ...a focus on innovative solutions and advancing AI technologies. #J-18808-Ljbffr Advanced Micro DevicesPerformance
$184k - $356.5k
A leading AI computing company in California is seeking a Senior Deep Learning Software Engineer focused on performance optimization of LLM models. You will analyze and enhance LLM inference performance, working in cross-collaborative teams to implement cutting-edge algorithms...PerformanceFull time- ...seeking a Senior DL Algorithms Engineer to optimize LLM/Omni models and enhance performance across its software stack. The ideal... ...deep learning, specifically in inference. This role involves profiling,... ...collaborating with teams to advance AI solutions. A strong...Performance
- ...Systems builds the world's largest AI chip, 56 times larger than... ...‑leading training and inference speeds and empowers machine learning... ...Role We are hiring a Senior Performance Analyst to join our Product... ...Collaborate with Product and Engineering to identify where competitors...PerformanceContract workShift work
$170.5k - $315.49k
## Inference Optimization Engineer (local / edge runtime)Applylocations: US, California, Santa Clara: US,... ...MissionAt Intel, our journey is to transform AI into something safer, more... ...across hardware tiers and publish honest performance comparisons* Upstream fixes and patches...PerformanceInternshipLocal areaImmediate startShift work$224k - $356.5k
...into the unlimited potential of AI to define the next era of... ...at the forefront of AI and high-performance computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems... ...Work alongside model training, inference, and product divisions to provide...Performance$160k - $253k
...centers are transforming into AI factories, and NVIDIA accelerated computing is the engine of artificial intelligence. Our... ...center platforms integrate high performance compute, networking, and a full... ...performance and efficiency for AI inference & training. What you’ll be...Performance- ...design and verification with agentic AI workflows. Our platform... ...cutting‑edge generative AI to assist engineers in RTL design, simulation, and... ...Systems Engineer to optimize the performance and efficiency of large language model inference powering our agentic AI platform...Performance
- A leader in AI technology in Palo Alto is seeking a Senior AI Systems Performance Engineer to optimize the latest foundation models on their innovative platform. This role involves collaborating with cross-functional teams to push the performance limits of AI systems....Performance
$170.6k - $261.3k
...global scale. Our Embodied AI teams are redefining what’s possible... ...stop. As a Senior Software Engineer on the Secondary Driving... ...testing, continuous integration, performance profiling, and observability... ...GPU/accelerator‑based ML inference, model deployment, and performance...PerformanceRemote workRelocation packageFlexible hours$200k
Velaura is seeking a Senior RTL Engineer to build next-generation Physical AI SoCs. This role involves collaboration with teams to drive microarchitectural decisions and ensure high-performance, power-efficient hardware. The ideal candidate has over 8 years of experience...Performance- Apple Inc. is seeking a Software Performance Engineer for its Vision Products Group in Sunnyvale, California. In this role, you'll optimize AR/VR system software for high-performance and low-latency experiences. You will work with a team aiming to push the boundaries of...Performance
$165k - $180k
Lead Ultrasound Imaging Engineer (AI & Systems) iSono Health is a dynamic and rapidly growing... ...onward to ensure excellent field performance, high reliability, supply continuity, efficient... ...platforms (e.g., Jetson) for edge AI inference and image processing acceleration....Performance- Advanced Micro Devices (AMD) is seeking a skilled engineer to optimize deep learning frameworks for AMD GPUs. You will enhance GPU performance, accelerate deep learning models, and work... ...an opportunity to significantly impact AI solutions while fostering innovation and...Performance
- Lemurian Labs in Santa Clara seeks a Runtime Engineer to design and develop a multi-target runtime for their AI compiler stack. This role involves low-level parallelization... ...with compiler and product teams to enhance performance across diverse hardware. The ideal candidate...Performance
$150k - $250k
MixMode in Santa Clara is looking for a Senior Staff SI/PI Engineer to ensure the electrical integrity of high-performance AI compute platforms. This role involves driving AI accelerator strategies and leading the simulation efforts for complex chip packages. Candidates...Performance- Intel Corporation is seeking a Senior Compiler Engineer to develop and optimize compiler software for next-generation GPU architectures... ...on cutting-edge compiler technologies that enhance AI and high-performance computing performance. The ideal candidate will possess a Bachelor...Performance
$207k - $300k
Google is seeking an experienced AI/ML Software Engineer to enhance GPU architectures and optimize performance benchmarks. The role involves collaborating with teams to solve ML model challenges and architect transformative AI solutions, contributing to Google's machine...Performance- NVIDIA Gruppe is looking for a senior engineer to join their Math Libraries team in Santa Clara... ...involves designing and implementing high-performance numerical linear algebra software on GPUs... ...opportunity to be part of cutting-edge AI and data center technologies. #J-18808-...Performance
$180k - $260k
...As a Senior MEMS Design Engineer at nEye.ai, you will be responsible for the design, simulation, and optimization of MEMS devices that... ...Engineers to ensure our MEMS structures are not only high‑performing but also robust and reliable for high‑volume manufacturing....Performance- ...Corporation is seeking a Senior Systems Software Engineer for their DGX Cloud team in Santa Clara, California. In this role, you will lead performance and scalability analysis of Kubernetes-... ..., ensuring high efficiency for AI workloads. You will work closely with researchers...Performance
$163k - $237k
...Inc. is seeking an experienced candidate to shape the future of AI/ML hardware acceleration, focusing on TPU technology that... ...and verifying power delivery networks, ensuring reliability and performance in advanced designs. Expect a collaborative environment working...Performance$136k - $218.5k
NVIDIA in Santa Clara is seeking a Silicon Speed Features Engineer to co-design system-level speed features across... ...involves collaborating cross-functionally and using AI to enhance automation tools for performance validation. Ideal candidates should have a Master’s degree...Performance$147k - $202.5k
...Materials is a global leader in materials engineering solutions used to produce virtually every... ...that literally connect our world – like AI and IoT. If you want to push the boundaries... .... You will collect and analyze data, perform hardware characterization, and troubleshoot...PerformanceFull timeWork experience placementRelocation- NVIDIA Gruppe is seeking a Silicon Speed Features Engineer to lead validation and automation infrastructure for silicon issues. You will work across teams to ensure product quality and performance in a dynamic environment. This role requires an MS in EE or equivalent,...Performance
$250k
...eGain is the leader in AI knowledge management solutions for enterprises. As organizations... ...As Director of Site Reliability Engineering, you will ensure that eGain’s AI knowledge... ...platform operates with the reliability, performance, and resilience that enterprise...PerformanceWork at office$120k - $180k
...security with the world's most advanced AI-native platform. We work on large scale... ...About the Role You’ll work closely with engineering teams to expand test coverage across unit... ...foundation that ensures reliability and performance as we deploy AI security controls across...PerformanceFull timeContract workWork experience placementWork at officeLocal area$184k - $287.5k
NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!
- ai developer Santa Clara, CA
- ai prompt engineer Santa Clara, CA
- ai engineer Santa Clara, CA
- senior ai engineer Santa Clara, CA
- ai ml engineer Santa Clara, CA
- ai engineer remote Santa Clara, CA
- machine learning ai engineer Santa Clara, CA
- senior performance engineer Santa Clara, CA
- performance specialist Santa Clara, CA
- performance food group Santa Clara, CA


