Principal AI Performance Architect for Scalable GPU Training
Advanced Micro Devices
Advanced Micro Devices is looking for a Principal Engineer in Santa Clara, CA to lead AI infrastructure development, define GPU architecture specifications, and drive performance gains in ML systems. The role involves leading innovative techniques, collaborating with stakeholders, and establishing best practices for distributed ML systems. The ideal candidate has extensive experience in GPU architectures, CUDA programming, and optimizing large-scale ML systems. A Bachelor's, MS or PhD in Computer Science or Engineering is required. #J-18808-Ljbffr Advanced Micro Devices
$272k - $431.25k
We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our... ...potent, effective, and scalable solutions as we mold... ...Monitor and optimize the performance of our infrastructure ensuring... ...distributed training operations using PyTorch...PrincipalTrainingPerformance$272k - $431.25k
...group is solving some of AI’s hardest... ...interconnects. This Principal Architect role leads the research... ...communication systems—GPU-to-GPU, GPU-to-storage... ...deep expertise in high-performance networking (InfiniBand... ...parallelism, or distributed training and inference patterns...PrincipalTrainingPerformance- ...experiences—from AI and data centers,... ...career. THE ROLE As a Principal Engineer, you will... ...by defining GPU architecture specifications... ...massive model training at scale. Your... ...expertise will drive 2-3x performance gains in both... ...dimensions Architect memory‑efficient training...PrincipalTrainingPerformanceRemote work
$272k - $431.25k
...Responsibilities Develop innovative high-performance processor and system architectures, focusing on the memory system and energy efficiency... ...micro‑architecture features to improve the state‑of‑the‑art in GPU memory systems, optimizing along the axes of perf/W, perf/mm,...PrincipalPerformance- Graphcore in Milpitas is looking for a highly accomplished GPU Architect to lead the design of AI accelerators and multi-GPU clusters. The role involves... ...across hardware and software teams to ensure optimal performance and reliability of GPU infrastructures as AI demands...Performance
- NVIDIA Gruppe in Santa Clara is seeking a Deep Learning Communication Architect to optimize DNN models and enhance communication performance during distributed training. This role requires collaboration with hardware/software teams to implement efficient communication...TrainingPerformance
$208k - $327.75k
...accelerated computing, AI, and autonomous... ...-of-the-art AI, high-performance compute, and scalable software-defined architectures... ...for a Senior AI Architect to help define the next... ...SoCs, including GPU, CPU, DLA, memory hierarchy... ...of distributed training systems, scaling laws...TrainingPerformanceWorldwide- ...experiences—from AI and data... ...THE ROLE: As a Principal AI Infrastructure... ...large‑scale LLM training and inference on... ...strong expertise in GPU‑accelerated... ..., high‑performance AI workloads at... ...Kubernetes and SLURM. Architect and validate... ...where applicable, scalable checkpointing)...PrincipalTrainingPerformance
$272k - $431.25k
...Always‑On, low‑overhead GPU profiling service that... ...interfaces, data flows, and scalability guarantees for multi‑... .../platform layers, and performance counter/trace providers... ...with existing ML/AI workflows (e.g., PyTorch... ...on experience tuning ML training/inference loops based on...PrincipalTrainingPerformance- ...world's largest AI chip, 56 times... ...industry-leading training and inference... ...times faster than GPU‑based... ...We're hiring a Principal Engineer for our... ...workloads, high‑QPS performance, and the operating... ...& Performance. Architect active‑active... ...requirements into scalable system designs...PrincipalTrainingPerformance
$180k - $200k
...the World’s leading AI-first Quality... ...looking for a Gen AI Architect to join our growing... ...solutions are scalable, secure, compliant... ...LangGraph) that meet performance, scalability, and... ...unstructured) for training, fine-tuning, and... ...Optimization - Implement GPU optimization,...TrainingPerformanceFull timeCasual workLocal areaFlexible hours$224k - $356.5k
...artificial intelligence (AI) / deep learning (DL), high-performance computing (HPC),... ...socket CPU and CPU/GPU systems. Work... ...CPU and interconnect architects to improve future... ...enabling faster AI model training, agentic use-cases,... ...processing, and scalable cloud deployments....TrainingPerformance$296.3k
...Role: We are seeking a Principal AI Engineer to lead the... ...powers large-scale training and cloud inference.... .... What You’ll Do: Architect, build, and optimize... ...for reliability, scalability, and performance across the AI/ML platform... ...systems, GPU computing, and cloud...PrincipalTrainingPerformanceRemote workFlexible hours- ...generation computing experiences—from AI and data centers, to PCs, gaming and... ...r e A r c h i t e c t THE ROLE: As GPU Software Architect, you will provide technical... ...architectural intent translates into working, performant, and scalable solutions for partnerships...PrincipalPerformanceRemote work
$192k - $267k
Principal Architect, AI and Semiconductors, Google Cloud Google San Francisco, CA, USA ; Sunnyvale... ...developed for security, reliability, and scalability, running the full stack from... ...experience, and relevant education or training. US: $192000 - $267000 (USD) + 42.86...PrincipalTraining$190k - $280k
...experience, education, and training. We also offer incentive opportunities... ...on individual and company performance. This is in addition to our... ...the potential of generative AI to power the transformation... ...per week Hybrid. The role: Principal Architect- Performance Analysis and...PrincipalTrainingPerformanceFull time3 days per week$209.5k - $299.2k
...evolving markets like AI, cloud, networking, and... ...the Role The DFT Architect at Altera is a senior... ...of innovation in high‑performance compute, AI acceleration... ...generations. You will architect scalable, robust, and forward‑... ..., experience, and training. Incentive...PrincipalTrainingPerformanceLocal areaShift work- NVIDIA Gruppe in Santa Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting... ...edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years of...Performance
$296.3k
...team in Embodied AI and is responsible... ...datasets for model training and evaluation, now... ...reliability, and scalability of next‑generation... ...vehicles. Role As a Principal Engineer in the... ...visualize AV model performance. As a full ‑stack... ...on modern cloud / GPU infrastructure, with...PrincipalTrainingPerformanceLocal areaFlexible hours$272k - $431.25k
NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California... ...the efficiency of AI/ML research on GPU Clusters. The role involves collaboration... ...teams, monitoring infrastructure performance, and implementing improvements based on...PrincipalPerformance- Overview We are now looking for a Senior GPU & Deep Learning Architect to join the NVIDIA GPU Architecture... ...architectures targeting both training and inference workloads. Advance the... ...validate, and verify functional or performance models. Develop tests, test plans, and...TrainingPerformance
- ...the next generation of AI breakthroughs and power... ...engineers and systems architects, Graphcore enjoys a culture... ...experienced GPU Architect to define the... ...trillion‑parameter LLM training and high‑throughput localized... ...reliability, and interconnect performance strategies that ensure...TrainingPerformance
$172k - $349k
## Principal Software Engineer, Systems/Solutions... ...reliability, scalability, resiliency, and performance across highly complex... ...across product lines.* Architect scalable, reusable,... ...adoption of AI-assisted testing workflows... ..., education/training, and/or skill level...PrincipalTrainingPerformanceWork experience placementWork at office2 days per week$160k - $225k
...seeking a Senior Software Engineer for AI Runtime in Mountain View, California. This... ...and scaling systems for large-scale GPU training, contributing to the architecture of a managed... ...$225,000, with additional benefits and performance bonuses. #J-18808-Ljbffr United States...TrainingPerformance$128k - $312k
...Expect The Tesla AI Hardware team is at... ...built to efficiently train massive neural... ...computational efficiency and performance. By creating... ...the AI/ML Compute Architect will drive the... ...create efficient, scalable solutions that power... ...knowledge of CPU, GPU, and ML...TrainingPerformanceHourly payTemporary workWork at officeFlexible hours$143k - $286k
...ML, causal inference, and Gen‑AI. Desirable Gen‑AI skills: embedding... ..., MapReduce, HQL, Scala), and GPU/CUDA for computational... ...000‑$264,000 in Hoboken) plus performance‑based bonuses. ~401(k) match... ...reimbursement, and access to internal training. ~ Special considerations...PrincipalTrainingPerformanceTemporary workFlexible hours- NVIDIA Corporation is searching for a Senior GPU Architect in Santa Clara, CA to innovate and contribute to the design of our proprietary... ...utilizing hardware modeling and verification to enhance GPU performance insights. Prospective candidates should possess a Master's or...PerformanceRemote job
- d-Matrix is seeking a Principal Compute Design Engineer to be responsible for... ...micro-architecture and design of AI sub-system modules. You will collaborate with System Architects to develop efficient solutions, ensuring high performance and efficiency in RTL design. The...PrincipalPerformance3 days per week
- ...NVIDIA Gruppe seeks a Principal Architect to drive the architectural vision for AI communication systems. This role involves setting the technical direction,... ...networking and systems software, particularly in high-performance environments, as well as a degree in Computer...PrincipalPerformance
$240k - $379.5k
...customers where they are on their AI journey on our GPUs - this... ...Product Manager for AI Platform training you will be responsible for... ...builders to get the best large scale performance, resilience and experience on... ...deep learning across all GPU use cases and providing great...PrincipalTrainingPerformance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal AI Performance Architect for Scalable GPU Training. Be the first to apply!
- principal Santa Clara, CA
- principal data scientist Santa Clara, CA
- principal cloud computing engineer Santa Clara, CA
- senior principal cloud computing engineer Santa Clara, CA
- principal architect Santa Clara, CA
- senior principal scientist Santa Clara, CA
- performance test architect Santa Clara, CA
- senior performance tester Santa Clara, CA
- performance test engineer Santa Clara, CA
- performance improvement specialist Santa Clara, CA


