Principal AI Performance Architect - Large-Scale GPU Training
Advanced Micro Devices , Inc.
Advanced Micro Devices, Inc. is seeking a Principal Engineer to lead the development of next-generation AI infrastructure. Your role will involve defining GPU architecture specifications and optimizing large-scale model training processes.
Candidates should have deep expertise in GPU microarchitecture and experience with CUDA programming. This position allows for remote work for the right individual, with a preference for candidates based in Santa Clara, CA.
#J-18808-Ljbffr- ...Micro Devices is looking for a Principal Engineer in Santa Clara, CA to lead AI infrastructure development, define GPU architecture specifications, and drive performance gains in ML systems. The role... ...programming, and optimizing large-scale ML systems. A Bachelor's, MS...PrincipalTrainingPerformance
- ...The Role As a Principal Engineer, you... ...generation of AI infrastructure by defining GPU architecture specifications... ...massive model training at scale. Your expertise... ...drive 2-3x performance gains in both... ...impact on large‐scale ML workloads... ...dimensions Architect memory‐...TrainingPerformanceRemote work
$272k - $431.25k
...solving some of AI’s hardest... ...interconnects. This Principal Architect role leads the... ...communicate at scale—across GPUs, DPUs... ...systems—GPU-to-GPU, GPU-to-... ...communication runtimes for large-scale AI... ...expertise in high-performance networking (... ...or distributed training and inference patterns...PrincipalTrainingPerformance- ...is for a Senior Principal Engineer, AI/ML System Architect. As system architect... ...including AI training andinference workloads and performance demands, as well... ...interact with large OEM customerson... ...or other modern GPU accelerators and... ...drawing board to full-scale production and...PrincipalTrainingPerformanceLocal areaRemote work
$254k - $349.25k
...people, data, and AI agents connect... ...Fortune 100, 10,000 large enterprises, and... ...We are seeking a Principal ML Architect to lead the... ...model architecture, training, fine-tuning, and... ...of operating at scale across high-volume... ...continuously improve model performance and reliability...PrincipalTrainingPerformanceFlexible hours$254k - $349.25k
...people, data, and AI agents connect... ...Fortune 100, 10,000 large enterprises, and... ...We are seeking a Principal ML Architect to lead the design... ...architecture, training, fine‑tuning, and... ...of operating at scale across high-volume... ...continuously improve model performance and reliability...PrincipalTrainingPerformanceFlexible hours$272k - $431.25k
...serving generative AI and reasoning... ...Built in Rust for performance and Python for extensibility... ...orchestrates GPU shards, routes... ...at datacenter scale. As large language models rapidly... ...We are seeking a Principal Systems Engineer... ...LLM inference. Architect and implement...PrincipalPerformanceLocal areaRemote work- ...computing experiences—from AI and data centers, to... ...a Robotics AI Architect to define and scale next‑generation Physical... ...production‑grade performance targets. THE PERSON As... ...co‑design across CPU, GPU, and accelerators Lighthouse... ...subsystems Cloud (training, simulation, fleet...PrincipalTrainingPerformance
- ...position is for a Senior Principal Engineer, AI/ML System Architect. As system architect,... ...design including AI training and inference workloads and performance demands, as well as... ...and interact with large OEM customers on ongoing... ..., or other modern GPU accelerators and support...PrincipalTrainingPerformanceLocal areaRemote work
$206.4k - $379.1k
...whether individuals or large organizations, to... ...content. The AI Foundations team... ...creativity at scale in design, imaging... ...re looking for a Principal Architect to build and implement... ...to support model training, fine‐tuning,... ...Develop high‐performance runtime services...PrincipalTrainingPerformanceLocal areaWorldwideFlexible hours- ...experiences-from AI and data... ...are seeking a Principal Software Quality... ...Instinct™ GPU platforms. You... ..., workload, performance, stress, stability, scale-out, and system... ...- LLM training and inference... ...tracking. ~ Architect the test infrastructure... ...systems and large-scale cluster...PrincipalTrainingPerformanceContract workShift work
- ...Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design... ..., focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years of experience with large-scale infrastructure, strong programming...Performance
- ...seeks a Staff HPC Engineer to design and optimize large scale compute environments for scientific computing and AI workloads. The ideal candidate should have... ...researchers and developers, focusing on scalability and performance optimization. KLA offers competitive benefits...Performance
$208k - $327.75k
...accelerated computing, AI, and autonomous... ...-the-art AI, high-performance compute, and... ...looking for a Senior AI Architect to help define the... ...SoCs, including GPU, CPU, DLA, memory... ...architectures and large-scale model systemsExperience... ...of distributed training systems, scaling...TrainingPerformanceWorldwide$240k - $379.5k
...customers where they are on their AI journey on our GPUs - this... ...Manager for AI Platform training you will be responsible for... ...model builders to get the best large scale performance, resilience and experience... ...enabling deep learning across all GPU use cases and providing...PrincipalTrainingPerformance- ...Principal AI Agent / ML Software Engineer The... ...used in large-scale, business-critical... ...observability. Design, architect, and deliver... ...high throughput, GPU efficiency, reliability... ...reliability, performance, security posture... ...GPU inference or training workloads for latency...PrincipalTrainingPerformance
- General Motors is seeking a Principal Engineer to lead the... ...and visualizing AV model performance. This role will involve scaling state-of-the-art tools and... ...teams within the Embodied AI group. The ideal candidate... ..., and experience with large-scale systems. Exceptional...PrincipalPerformance
$272k - $431.25k
...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure... .... Monitor and optimize the performance of our infrastructure ensuring high... ...substantial distributed training operations using PyTorch (DDP, FSDP...PrincipalTrainingPerformance- ...experiences—from AI and data centers,... ...THE ROLE: As a Principal AI Infrastructure... ...customers to enable large‑scale LLM training and inference on... ...expertise in GPU‑accelerated computing... ...resilient, high‑performance AI workloads at scale... ...and SLURM. Architect and validate Kubernetes...PrincipalTrainingPerformance
- ...experiences—from AI and data centers,... ...AMD’s Data Center GPU organization is... ...highly accomplished Principal Modeling Architect to join the... ...requirements and performance projections. Your... ..., datatypes, and scaling methodologies to... ...analyzing AI/ML, HPC, or large‑scale data...PrincipalPerformanceRemote work
- ...computing experiences-from AI and data centers,... ...improving the performance of key applications... .../ML workloads and GPU-accelerated computing... ..., including Large Language Models (LLMs... ...and accelerate LLM training and inference on AMD... ...clusters, including large-scale training and...PrincipalTrainingPerformance
$272k - $431.25k
...creative solutions architect with experience in... ...to join the NVIDIA GPU Cloud Infrastructure... ...capacity planning and scale testing. Ensure... ..., DNS, QUIC. High‑performance networking and low‑... ...Experience designing large‑scale distributed... ...2026. NVIDIA uses AI tools in its recruiting...PrincipalPerformance$296.3k
...Role: We are seeking a Principal AI Engineer to lead the... ...that powers large-scale training and cloud inference.... ...autonomy. What You’ll Do: Architect, build, and optimize... ...reliability, scalability, and performance across the AI/ML... ...distributed systems, GPU computing, and cloud...PrincipalTrainingPerformanceRemote workFlexible hours- ...transportation on a global scale. The Data... ...AV product performance through smart... ...existing very large datasets that... ...foundation model pre-training and fine-tuning... ...impact team of AI/ML engineers,... .... As a Principal Technical Lead... ...in designing, architecting, and deploying...PrincipalTrainingPerformanceLocal areaRemote workWork from homeRelocationRelocation packageFlexible hours
$272k - $431.25k
...unlimited potential of AI to define the next era... .... An era in which our GPU acts as the brains of... ...world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure... ...engineering bar for large‑scale networked systems... ...silicon, or other high‑performance computing systems....PrincipalPerformanceShift work$296.3k
...a part of the Scaling Foundations team in Embodied AI and is responsible... ...for model training and evaluation,... ...utilization of our large datasets,... ...vehicles. Role As a Principal Engineer in the... ...visualize AV model performance. As a full‑... ...modern cloud / GPU infrastructure,...PrincipalTrainingPerformanceLocal areaFlexible hours- NVIDIA Gruppe in Santa Clara is looking for a Senior Systems Software Engineer to focus on GPU performance at scale. You will be instrumental in driving innovation in AI and GPU computing, contributing to state-of-the-art computing hardware. The ideal candidate has extensive...Performance
- ...globally for innovation, performance and quality.... ...In this AI/ML ASIC Architecture... ...As an AI/ML ASIC Architect you will help drive... ...the AI Storage with GPU/TPU/xPU accelerators... ...efficient inference/training systems utilizing... ...experience optimizing large-scale ML systems, GPU...TrainingPerformanceTemporary workRemote workFlexible hoursShift workNight shift
$184k - $356.5k
...Software Engineer in Santa Clara to enhance the performance and reliability of large-scale AI infrastructures. The role involves leadership in debugging and optimizing distributed training workloads across NVIDIA’s GPU platforms. Ideal candidates should have extensive...TrainingPerformance- ...feature engineering, model training, deployment,... ...explainability, and responsible AI compliance. **... ...**Proven experience** architecting large-scale ML/AI platforms with attention to performance, scalability, and maintainability... .... The ideal Principal has a deep technical science...PrincipalTrainingPerformance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal AI Performance Architect - Large-Scale GPU Training. Be the first to apply!
- principal architect Santa Clara, CA
- principal Santa Clara, CA
- senior principal scientist Santa Clara, CA
- senior principal cloud computing engineer Santa Clara, CA
- principal data scientist Santa Clara, CA
- principal cloud computing engineer Santa Clara, CA
- senior performance engineer Santa Clara, CA
- application performance engineer Santa Clara, CA
- performance engineer Santa Clara, CA
- performance food group Santa Clara, CA


