Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal AI Performance Architect for Scalable GPU Training

Advanced Micro Devices , Inc.

Advanced Micro Devices is looking for a Principal Engineer in Santa Clara, CA to lead AI infrastructure development, define GPU architecture specifications, and drive performance gains in ML systems. The role involves leading innovative techniques, collaborating with stakeholders, and establishing best practices for distributed ML systems. The ideal candidate has extensive experience in GPU architectures, CUDA programming, and optimizing large-scale ML systems. A Bachelor's, MS or PhD in Computer Science or Engineering is required. #J-18808-Ljbffr Advanced Micro Devices , Inc.

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Principal AI Performance Architect for Scalable GPU Training in Santa Clara, CA vacancy
  • $272k - $431.25k

     ...artificial intelligence (AI), agentic workloads, deep learning (DL), high-performance computing (HPC), cloud...  .... Knowledge of GPU and SOC design. Experience...  ...faster AI model training, agentic use-cases, efficient...  ...data processing, and scalable cloud deployments.... 
    Principal
    Training
    Performance

    NVIDIA

    Santa Clara, CA
    26 days ago
  •  ...The Role As a Principal Engineer, you will spearhead...  ...next generation of AI infrastructure by defining GPU architecture...  ...enable massive model training at scale. Your...  ...expertise will drive 2-3x performance gains in both...  ...parallel dimensions Architect memory‐efficient... 
    Principal
    Training
    Performance
    Remote work

    AMD

    Santa Clara, CA
    2 days ago
  •  ...experiences—from AI and data centers,...  ...seeking a Robotics AI Architect to define and...  ...production‑grade performance targets. THE PERSON...  ...enable broad ecosystem scalability. KEY...  ...design across CPU, GPU, and accelerators...  ...subsystems Cloud (training, simulation, fleet... 
    Principal
    Training
    Performance

    Advanced Micro Devices , Inc.

    San Jose, CA
    2 days ago
  • $272k - $431.25k

     ...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our...  ...potent, effective, and scalable solutions as we mold...  ...Monitor and optimize the performance of our infrastructure ensuring...  ...distributed training operations using PyTorch... 
    Principal
    Training
    Performance

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

     ...group is solving some of AI’s hardest...  ...computing interconnects. This Principal Architect role leads the research...  ...communication systems—GPU-to-GPU, GPU-to-storage...  ...deep expertise in high-performance networking (InfiniBand...  ...parallelism, or distributed training and inference patterns... 
    Principal
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...This position is for a Senior Principal Engineer, AI/ML System Architect. As system architect, one...  ...systems design including AI training and inference workloads and performance demands, as well as compute,...  ...AMD, Intel, or other modern GPU accelerators and support systems... 
    Principal
    Training
    Performance
    Local area
    Remote work

    Celestica

    San Jose, CA
    2 days ago
  • $206.4k - $379.1k

     ...impressive content. The AI Foundations team constructs a flexible, scalable AI framework that...  ...We're looking for a Principal Architect to build and implement...  ...infrastructure to support model training, fine‐tuning,...  ...frameworks. Develop high‐performance runtime services for... 
    Principal
    Training
    Performance
    Local area
    Worldwide
    Flexible hours

    Adobe

    San Jose, CA
    2 days ago
  •  ...This position is for a Senior Principal Engineer, AI/ML System Architect. As system architect,one will...  ...systems design including AI training andinference workloads and performance demands, as well as compute,...  ...AMD, Intel, or other modern GPU accelerators and support systems... 
    Principal
    Training
    Performance
    Local area
    Remote work

    Celestica

    San Jose, CA
    7 days ago
  • $254k - $349.25k

     ...protect how people, data, and AI agents connect across email...  ...Overview We are seeking a Principal ML Architect to lead the design and...  ...expertise in model architecture, training, fine‑tuning, and...  ...continuously improve model performance and reliability Productionization... 
    Principal
    Training
    Performance
    Flexible hours

    Proofpoint

    Sunnyvale, CA
    2 days ago
  • $254k - $349.25k

     ...protect how people, data, and AI agents connect across email...  ...We are seeking a Principal ML Architect to lead the design and development...  ...in model architecture, training, fine-tuning, and distillation...  ...continuously improve model performance and reliability... 
    Principal
    Training
    Performance
    Flexible hours

    Proofpoint

    Sunnyvale, CA
    4 days ago
  • $272k - $431.25k

     ...NVIDIA Gruppe is seeking experienced engineers for its Dynamo platform, focusing on scalable AI systems. You will develop the Kubernetes deployment, optimize GPU resource management, and work on intelligent routing and KV-cache management. Applicants should have 15+ years... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...pivotal role in crafting the future of GPU technology. At NVIDIA, you will work...  ..., optimizing along the axes of scalability/modularity, performance, area, yield, effort, and schedule....  ...an existing vacancy.  NVIDIA uses AI tools in its recruiting processes.... 
    Performance
    Work experience placement
    Night shift

    NVIDIA

    Santa Clara, CA
    4 days ago
  • NVIDIA Gruppe in Santa Clara is seeking a Deep Learning Communication Architect to optimize DNN models and enhance communication performance during distributed training. This role requires collaboration with hardware/software teams to implement efficient communication... 
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  •  ...Job Summary T he AI Interconnect Architect designs and engineers high-speed...  ..., power efficiency, scalability, and optimized transport protocols...  ...optimize power, cost, and performance across diverse workloads....  ...hardware architecture, including GPU/accelerator clusters and... 
    Principal
    Performance

    Compunnel

    Milpitas, CA
    5 days ago
  • $272k - $431.25k

     ...Always-On, low-overhead GPU profiling service that...  ...interfaces, data flows, and scalability guarantees for multi-...  .../platform layers, and performance counter/trace providers...  ...with existing ML/AI workflows (e.g., PyTorch...  ...on experience tuning ML training/inference loops based on... 
    Principal
    Training
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  •  ...experiences—from AI and data...  ...THE ROLE: As a Principal AI Infrastructure...  ...large‑scale LLM training and inference on...  ...strong expertise in GPU‑accelerated...  ...resilient, high‑performance AI workloads at...  ...and SLURM. Architect and validate Kubernetes...  ...applicable, scalable checkpointing)... 
    Principal
    Training
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...is to innovate how we architect and develop our GPU for the changing AI and accelerated workloads...  ...ways to improve the scalability of our design, infrastructure...  .... We are looking for a Principal System Architect with a...  ...to fruition high-performance, high-volume System-on-... 
    Principal
    Performance

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

     ...Responsibilities Develop innovative high-performance processor and system architectures, focusing on the memory system and energy efficiency...  ...micro‑architecture features to improve the state‑of‑the‑art in GPU memory systems, optimizing along the axes of perf/W, perf/mm,... 
    Principal
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

     ...We are seeking a world-class Principal Memory Simulation Architect to design and develop the next-generation GPU memory and on-chip interconnect performance and functional models. Ideal candidates...  ...large-scale architectural simulation, AI-assisted model generation, and... 
    Principal
    Performance
    Night shift

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

     ...and creative solutions architect with experience in network...  ...to join the NVIDIA GPU Cloud Infrastructure team...  ...for the product into scalable technical architectures...  ...BGP, DNS, QUIC. High‑performance networking and low‑latency...  ...12, 2026. NVIDIA uses AI tools in its recruiting... 
    Principal
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...computing experiences-from AI and data centers, to...  ...AMD's Data Center GPU organization is...  ...highly accomplished Principal Modeling Architect to join the Product Architecture...  ...requirements and performance projections. Your...  ...project performance, scalability, and resource utilization... 
    Principal
    Performance
    Remote work

    Advanced Micro Devices , Inc.

    San Jose, CA
    3 days ago
  • $296.3k

     ...Role: We are seeking a Principal AI Engineer to lead the...  ...powers large-scale training and cloud inference....  .... What You’ll Do: Architect, build, and optimize...  ...practices for reliability, scalability, and performance across the AI/ML...  ...systems, GPU computing, and cloud... 
    Principal
    Training
    Performance
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  •  ...generation computing experiences—from AI and data centers, to PCs, gaming and...  ...r e A r c h i t e c t THE ROLE: As GPU Software Architect, you will provide technical...  ...architectural intent translates into working, performant, and scalable solutions for partnerships... 
    Principal
    Performance
    Remote work

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    2 days ago
  • $166.52k - $249.5k

     ...Principal System Architect Marvell's semiconductor solutions...  ...enterprise, cloud and AI, and carrier...  ...for CPU as well as GPU to unlock memory wall...  ...problems for AI interface/training. The memory wall...  ...hardware level for scalability and performance optimization. Benchmark... 
    Principal
    Training
    Performance
    Permanent employment
    Work experience placement
    Internship
    Work from home

    Marvell

    Santa Clara, CA
    4 days ago
  • $185.2k - $299.48k

     ...and Inclusion. We weave AI into the fabric of...  ...create them. As a Sr Principal Engineer focused on the...  ...scale, security, and performance at the most critical point...  ...Innovation: Design, architect, and implement production...  ...(PoCs) to deploying scalable ML baselines and intelligent... 
    Principal
    Performance
    Full time
    Work at office
    Shift work

    Palo Alto Networks

    Santa Clara, CA
    2 days ago
  •  ...experiences-from AI and data centers,...  ...We are seeking a Principal Software Quality Engineer...  ...on AMD Instinct™ GPU platforms. You...  ...framework, workload, performance, stress, stability...  ...- LLM training and inference (PyTorch...  ...regression tracking. ~ Architect the test... 
    Principal
    Training
    Performance
    Contract work
    Shift work

    Advanced Micro Devices , Inc.

    San Jose, CA
    7 days ago
  • $221.7k - $364.8k

     ...IP) that is applied to high-performance computing devices (mobile,...  ...and Responsibilities As a Principal of GPU Architect & Modeling, you will lead teams...  ...graphics, compute, and AI capabilities for millions of...  ...performance and power efficiency, scalable across Samsung’s premium... 
    Principal
    Performance
    Hourly pay
    Full time
    Worldwide
    Relocation

    Samsung Electronics Perú

    San Jose, CA
    3 days ago
  •  ...NVIDIA Gruppe in Santa Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting...  ...edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years of... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $157k - $271.4k

     ...empowerment, surgical performance, operating room (...  ...devices, data and AI driven insights in...  ...recruiting for a Principal Software Engineer...  ...to scaling training and inference across...  ...product and ship scalable APIs, SDKs, CLIs,...  ...model CI/CD, and GPU resources management... 
    Principal
    Training
    Performance
    Immediate start

    J&J Family of Companies

    Santa Clara, CA
    3 days ago
  •  ...Overview We are now looking for a Senior GPU & Deep Learning Architect to join the NVIDIA GPU Architecture...  ...architectures targeting both training and inference workloads. Advance the...  ...validate, and verify functional or performance models. Develop tests, test plans, and... 
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal AI Performance Architect for Scalable GPU Training. Be the first to apply!