Fellow GPU Performance Optimizer for AI Training
Advanced Micro Devices
A technology company is looking for a Fellow GPU Performance Optimization Engineer in San Jose, CA. This role focuses on maximizing the performance of large-scale AI training workloads on AMD GPU platforms. Candidates must have deep expertise in GPU architecture, distributed systems, and ML workloads, alongside strong technical leadership skills. The position offers an opportunity to drive innovations across the software-hardware stack and work on impactful optimizations in an inclusive environment. #J-18808-Ljbffr Advanced Micro Devices
- ...experiences-from AI and data centers,... ...Software group. As a Fellow, you will be... ...end-to-end software optimization strategy to achieve... ...industry-leading performance for our top-tier customers... ...inference and training at scale across multi-node/multi-GPU environments. ~...TrainingPerformance
- ...generation computing experiences—from AI and data centers, to PCs, gaming and... ...your career. THE ROLE We are seeking a Fellow GPU Performance Optimization Engineer to join our Models and... ...performance and efficiency of large-scale AI training workloads on AMD GPU platforms. You...TrainingPerformance
- ...computing experiences-from AI and data centers, to... ...We are looking for a Fellow/Sr. Fellow Machine... ...Engineer to join our Training At Scale team. If you... ...training pipeline performance on large scale GPU cluster. Improve the... .... Design and optimize the distributed training...TrainingPerformance
- ...Principal Engineer in Santa Clara, CA to lead AI infrastructure development, define GPU architecture specifications, and drive performance gains in ML systems. The role involves... ...GPU architectures, CUDA programming, and optimizing large-scale ML systems. A Bachelor's, MS...TrainingPerformance
$272k - $431.25k
...Principal Ai And Ml Infra Software Engineer, Gpu Clusters We are seeking a Principal AI and ML Infra... ...such initiatives. Monitor and optimize the performance of our infrastructure ensuring... ...improving substantial distributed training operations using PyTorch (DDP,...TrainingPerformance- ...ML Systems Engineer — Training & Inference Optimization (MBMB) We are building large-... ...robot foundation models, high-performance training infrastructure,... ...compute stack Optimize GPU utilization across training... ...We are a research-driven AI and robotics company focused...TrainingPerformance
- ...computing experiences-from AI and data centers, to PCs... ...about improving the performance of key applications and... ...challenges in the industry: training and running AI to make... ...establish best practices and optimize performance from the lowest-level GPU kernels to large-scale...TrainingPerformance
$184k - $287.5k
...unlimited potential of AI to define the next era... ...computing. An era in which our GPU acts as the brains of... ...CPUs, and a fully optimized NVIDIA AI and HPC software... ...engineer to lead performance benchmarking and optimization... ...real-world AI training, inference, and HPC workloads...TrainingPerformance$136.8k - $359.72k
...GPU/AI Application Platform Architect - San Jose Location:... ...meet the requirements of high-performance, low cost and easy to operate... ...via application performance optimizations and architecture... ...architecture, familiar with training and inference requirements on...TrainingPerformanceTemporary workLocal area$45 per hour
...You will work on improving the performance and efficiency of large-scale AI models across training, inference, and deployment. This... ...and engineering efforts to optimize deep learning models for speed,... ...is a plus. - Familiarity with GPU programming (CUDA, Triton, or similar...TrainingPerformanceHourly payFull timeSummer workInternshipLocal area- ...computing experiences-from AI and data centers, to PCs, gaming... ...challenge of distributed training of large models on a large number... ...-to-end training pipeline performance. Optimize the distributed training... ...a plus. Experience with GPU kernel optimization is a plus...TrainingPerformance
- ...Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA... ...Responsibilities: Productize and optimize models from Research into reliable, performant, and cost-efficient services with... ...: ~3–5 years in ML/AI engineering roles owning training...TrainingPerformance
$184k - $287.5k
...Software Engineer, Model Optimization and Edge Deployment -... ...the forefront of the AI revolution,... ...etc. to boost E2E model performance for production deployments... ...proven track record of training, deploying, or optimizing... ...Strong understanding of GPU architecture, the...TrainingPerformance$173.66k - $245.16k
...cutting-edge technologies, optimize partner software stacks... ...solutions that enhance performance and reliability. By... ...databases, and analytics), AI/ML initiatives, and... ...to enable the AI PC and GPU IP to support all of... ...relevant education or training. Your recruiter can share...TrainingPerformanceLocal areaImmediate startShift work- ...generation computing experiences—from AI and data centers, to PCs, gaming... ...and beyond. Principal / Senior GPU Software Performance Engineer — Post‑Training THE ROLE: Drive the performance of... ...stability across data, model, and optimizer steps. Optimize multi‑GPU/multi‑node...TrainingPerformance
$256k - $414k
...design, scaling, and operations of high‑performance networking for GPU‑based cloud infrastructure. This... ...enabling cloud gaming workloads, AI/ML training, and inference platforms by delivering... ...at scale. Engage with ISPs to optimize low‑latency edge networks and ensure...TrainingPerformanceLocal area$152k - $241.5k
...and Shutdown Time KPIs goal & optimizations Drive end-to-end performance excellence: debug and root-cause GPU bottlenecks and issues for gaming, creator, and AI workload, validate BSP performance... ...across GPU SW stack, LLM training and inference, and Arm architecture...TrainingPerformance- ...Summary Apple Silicon GPU SW architecture team... ...models across many GPUs, optimizing every layer of the... ...but also dive deep into performance profiling, implement novel... ...help define the future of AI experiences delivered... ...) in the context of ML training/inference ~ Must have...TrainingPerformance
- ...is developing a new class of GPU and AI silicon for large-scale model inference and training. The compiler stack connects industry... ..., and is central to the performance and efficiency the company delivers... ..., and target-specific optimizations. Implement code generation improvements...TrainingPerformanceInternship
$122.44k - $232.19k
...Role and Impact: As a GPU Logic Design Engineer at... ...directly to achieving Intel's performance, power, area, and... ...tools, and methods to optimize logic design for power,... ...web services, HPC, and AI‑accelerated systems. Our... ...relevant education or training. Your recruiter can share...TrainingPerformanceLocal areaImmediate startWorldwideFlexible hoursShift work- A leading technology company is seeking a Fellow in AI Software to drive the software optimization strategy for top-tier customers. This role involves defining technical vision, leading workload performance engineering, and engaging with customers to deliver tailored solutions...Performance
$207k - $300k
...Engineer, GDC LLM Serving and GPU Performance Google Sunnyvale, CA, USA... ...Language Models? Join the GDC AI Models and Performance team... ...and flexibility. You could be optimizing KV cache transfer mechanisms... ..., and relevant education or training. Your recruiter can share...TrainingPerformanceFull time$224k - $356.5k
...to lead a team of skilled performance engineers collaborating... ...platform is known for its AI dominance in deep learning training and inference. Nonetheless... ...innovative techniques to optimize performance of complex... ...optimization, including GPU parallel programming, e.g...TrainingPerformance$131k - $226k
...- Velox Operators for GPU Location San Jose, California... ...computation to GPUs. Optimize memory bandwidth usage... .... Debug complex performance bottlenecks in a distributed... ...required by applicable law Training and educational... ...resources on our personalized, AI‑driven learning...TrainingPerformanceFull timeTemporary work$207k - $300k
...Experience with modern GPU architectures (NVIDIA, AMD, or other AI accelerators), memory hierarchies, and performance bottlenecks. Experience... ...Experience with compiler optimization, code generation, and runtime... ...relevant education or training. Your recruiter can share...TrainingPerformanceFull timeTemporary workWorldwide$156k - $229k
Senior Design Technology Co-Optimization Engineer Google • Sunnyvale,... ...class IP blocks (e.g., high-performance CPU/GPU cores, SRAM arrays, or high-... ...work to shape the future of AI/ML hardware acceleration. You... ..., and relevant education or training. Your recruiter can share...TrainingPerformanceFull timeWorldwide- ...computing experiences-from AI and data centers, to PCs, gaming... ...leader for the role of AMD Fellow, OneROCm - driving a unified... ..., models, frameworks, and performance optimization layers. The role also requires... ...: ~ Knowledge in GPU architectures, basic knowledge...Performance
- A leading semiconductor company is looking for a Principal/Senior GPU Software Performance Engineer in San Jose, CA. The role involves optimizing post-training workloads on AMD Instinct GPUs, improving throughput, and collaborating with various teams to drive measurable...TrainingPerformance
$109k - $160k
...GPU Infrastructure Software Engineer Sunnyvale, CA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers... ...superior infrastructure performance with deep technical... ...AI/ML infrastructure and training / inference. The base...TrainingPerformancePermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$184k - $287.5k
...our team at NVIDIA and help bring AI solutions to our largest... ...offering support in understanding performance aspects related to tasks like large scale LLM training and inference. Conducting regular... ...diagnostics. Hands-on experience with GPU systems in general including but...TrainingPerformance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Fellow GPU Performance Optimizer for AI Training. Be the first to apply!
- performance coach San Jose, CA
- senior performance engineer San Jose, CA
- lead performance test engineer San Jose, CA
- high performance computing engineer San Jose, CA
- performance nutrition San Jose, CA
- performance testing San Jose, CA
- performance engineer San Jose, CA
- performance test architect San Jose, CA
- system performance engineer San Jose, CA
- performance tester San Jose, CA


