Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal AI Performance Architect - Large-Scale GPU Training

Advanced Micro Devices , Inc.

Advanced Micro Devices, Inc. is seeking a Principal Engineer to lead the development of next-generation AI infrastructure. Your role will involve defining GPU architecture specifications and optimizing large-scale model training processes.

Candidates should have deep expertise in GPU microarchitecture and experience with CUDA programming. This position allows for remote work for the right individual, with a preference for candidates based in Santa Clara, CA.

#J-18808-Ljbffr
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Principal AI Performance Architect - Large-Scale GPU Training in Santa Clara, CA vacancy
  •  ...Micro Devices is looking for a Principal Engineer in Santa Clara, CA to lead AI infrastructure development, define GPU architecture specifications, and drive performance gains in ML systems. The role...  ...programming, and optimizing large-scale ML systems. A Bachelor's, MS... 
    Principal
    Training
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    18 hours ago
  •  ...The Role As a Principal Engineer, you...  ...generation of AI infrastructure by defining GPU architecture specifications...  ...massive model training at scale. Your expertise...  ...drive 2-3x performance gains in both...  ...impact on large‐scale ML workloads...  ...dimensions Architect memory‐... 
    Training
    Performance
    Remote work

    AMD

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...solving some of AI’s hardest...  ...interconnects. This Principal Architect role leads the...  ...communicate at scale—across GPUs, DPUs...  ...systems—GPU-to-GPU, GPU-to-...  ...communication runtimes for large-scale AI...  ...expertise in high-performance networking (...  ...or distributed training and inference patterns... 
    Principal
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  •  ...is for a Senior Principal Engineer, AI/ML System Architect. As system architect...  ...including AI training andinference workloads and performance demands, as well...  ...interact with large OEM customerson...  ...or other modern GPU accelerators and...  ...drawing board to full-scale production and... 
    Principal
    Training
    Performance
    Local area
    Remote work

    Celestica

    San Jose, CA
    8 days ago
  • $254k - $349.25k

     ...people, data, and AI agents connect...  ...Fortune 100, 10,000 large enterprises, and...  ...We are seeking a Principal ML Architect to lead the...  ...model architecture, training, fine-tuning, and...  ...of operating at scale across high-volume...  ...continuously improve model performance and reliability... 
    Principal
    Training
    Performance
    Flexible hours

    Proofpoint

    Sunnyvale, CA
    18 hours ago
  • $254k - $349.25k

     ...people, data, and AI agents connect...  ...Fortune 100, 10,000 large enterprises, and...  ...We are seeking a Principal ML Architect to lead the design...  ...architecture, training, fine‑tuning, and...  ...of operating at scale across high-volume...  ...continuously improve model performance and reliability... 
    Principal
    Training
    Performance
    Flexible hours

    Proofpoint

    Sunnyvale, CA
    3 days ago
  • $272k - $431.25k

     ...serving generative AI and reasoning...  ...Built in Rust for performance and Python for extensibility...  ...orchestrates GPU shards, routes...  ...at datacenter scale. As large language models rapidly...  ...We are seeking a Principal Systems Engineer...  ...LLM inference. Architect and implement... 
    Principal
    Performance
    Local area
    Remote work

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  •  ...computing experiences—from AI and data centers, to...  ...a Robotics AI Architect to define and scale next‑generation Physical...  ...production‑grade performance targets. THE PERSON As...  ...co‑design across CPU, GPU, and accelerators Lighthouse...  ...subsystems Cloud (training, simulation, fleet... 
    Principal
    Training
    Performance

    Advanced Micro Devices , Inc.

    San Jose, CA
    3 days ago
  •  ...position is for a Senior Principal Engineer, AI/ML System Architect. As system architect,...  ...design including AI training and inference workloads and performance demands, as well as...  ...and interact with large OEM customers on ongoing...  ..., or other modern GPU accelerators and support... 
    Principal
    Training
    Performance
    Local area
    Remote work

    Celestica

    San Jose, CA
    3 days ago
  • $206.4k - $379.1k

     ...whether individuals or large organizations, to...  ...content. The AI Foundations team...  ...creativity at scale in design, imaging...  ...re looking for a Principal Architect to build and implement...  ...to support model training, fine‐tuning,...  ...Develop high‐performance runtime services... 
    Principal
    Training
    Performance
    Local area
    Worldwide
    Flexible hours

    Adobe

    San Jose, CA
    3 days ago
  •  ...experiences-from AI and data...  ...are seeking a Principal Software Quality...  ...Instinct™ GPU platforms. You...  ..., workload, performance, stress, stability, scale-out, and system...  ...- LLM training and inference...  ...tracking. ~ Architect the test infrastructure...  ...systems and large-scale cluster... 
    Principal
    Training
    Performance
    Contract work
    Shift work

    Advanced Micro Devices , Inc.

    San Jose, CA
    3 days ago
  •  ...Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design...  ..., focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years of experience with large-scale infrastructure, strong programming... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...seeks a Staff HPC Engineer to design and optimize large scale compute environments for scientific computing and AI workloads. The ideal candidate should have...  ...researchers and developers, focusing on scalability and performance optimization. KLA offers competitive benefits... 
    Performance

    KLA-Belgium

    Milpitas, CA
    2 days ago
  • $208k - $327.75k

     ...accelerated computing, AI, and autonomous...  ...-the-art AI, high-performance compute, and...  ...looking for a Senior AI Architect to help define the...  ...SoCs, including GPU, CPU, DLA, memory...  ...architectures and large-scale model systemsExperience...  ...of distributed training systems, scaling... 
    Training
    Performance
    Worldwide

    NVIDIA Corporation

    Santa Clara, CA
    18 hours ago
  • $240k - $379.5k

     ...customers where they are on their AI journey on our GPUs - this...  ...Manager for AI Platform training you will be responsible for...  ...model builders to get the best large scale performance, resilience and experience...  ...enabling deep learning across all GPU use cases and providing... 
    Principal
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...Principal AI Agent / ML Software Engineer The...  ...used in large-scale, business-critical...  ...observability. Design, architect, and deliver...  ...high throughput, GPU efficiency, reliability...  ...reliability, performance, security posture...  ...GPU inference or training workloads for latency... 
    Principal
    Training
    Performance

    Oracle

    Santa Clara, CA
    47 minutes ago
  • General Motors is seeking a Principal Engineer to lead the...  ...and visualizing AV model performance. This role will involve scaling state-of-the-art tools and...  ...teams within the Embodied AI group. The ideal candidate...  ..., and experience with large-scale systems. Exceptional... 
    Principal
    Performance

    General Motors

    Sunnyvale, CA
    2 days ago
  • $272k - $431.25k

     ...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure...  .... Monitor and optimize the performance of our infrastructure ensuring high...  ...substantial distributed training operations using PyTorch (DDP, FSDP... 
    Principal
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...experiences—from AI and data centers,...  ...THE ROLE: As a Principal AI Infrastructure...  ...customers to enable large‑scale LLM training and inference on...  ...expertise in GPU‑accelerated computing...  ...resilient, high‑performance AI workloads at scale...  ...and SLURM. Architect and validate Kubernetes... 
    Principal
    Training
    Performance

    Advanced Micro Devices

    Santa Clara, CA
    3 days ago
  •  ...experiences—from AI and data centers,...  ...AMD’s Data Center GPU organization is...  ...highly accomplished Principal Modeling Architect to join the...  ...requirements and performance projections. Your...  ..., datatypes, and scaling methodologies to...  ...analyzing AI/ML, HPC, or large‑scale data... 
    Principal
    Performance
    Remote work

    Advanced Micro Devices , Inc.

    San Jose, CA
    3 days ago
  •  ...computing experiences-from AI and data centers,...  ...improving the performance of key applications...  .../ML workloads and GPU-accelerated computing...  ..., including Large Language Models (LLMs...  ...and accelerate LLM training and inference on AMD...  ...clusters, including large-scale training and... 
    Principal
    Training
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...creative solutions architect with experience in...  ...to join the NVIDIA GPU Cloud Infrastructure...  ...capacity planning and scale testing. Ensure...  ..., DNS, QUIC. High‑performance networking and low‑...  ...Experience designing large‑scale distributed...  ...2026. NVIDIA uses AI tools in its recruiting... 
    Principal
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $296.3k

     ...Role: We are seeking a Principal AI Engineer to lead the...  ...that powers large-scale training and cloud inference....  ...autonomy. What You’ll Do: Architect, build, and optimize...  ...reliability, scalability, and performance across the AI/ML...  ...distributed systems, GPU computing, and cloud... 
    Principal
    Training
    Performance
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago
  •  ...transportation on a global scale. The Data...  ...AV product performance through smart...  ...existing very large datasets that...  ...foundation model pre-training and fine-tuning...  ...impact team of AI/ML engineers,...  .... As a Principal Technical Lead...  ...in designing, architecting, and deploying... 
    Principal
    Training
    Performance
    Local area
    Remote work
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    3 days ago
  • $272k - $431.25k

     ...unlimited potential of AI to define the next era...  .... An era in which our GPU acts as the brains of...  ...world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure...  ...engineering bar for large‑scale networked systems...  ...silicon, or other high‑performance computing systems.... 
    Principal
    Performance
    Shift work

    NVIDIA Corporation

    Santa Clara, CA
    8 hours ago
  • $296.3k

     ...a part of the Scaling Foundations team in Embodied AI and is responsible...  ...for model training and evaluation,...  ...utilization of our large datasets,...  ...vehicles. Role As a Principal Engineer in the...  ...visualize AV model performance. As a full‑...  ...modern cloud / GPU infrastructure,... 
    Principal
    Training
    Performance
    Local area
    Flexible hours

    General Motors

    Sunnyvale, CA
    2 days ago
  • NVIDIA Gruppe in Santa Clara is looking for a Senior Systems Software Engineer to focus on GPU performance at scale. You will be instrumental in driving innovation in AI and GPU computing, contributing to state-of-the-art computing hardware. The ideal candidate has extensive... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  •  ...globally for innovation, performance and quality....  ...In this AI/ML ASIC Architecture...  ...As an AI/ML ASIC Architect you will help drive...  ...the AI Storage with GPU/TPU/xPU accelerators...  ...efficient inference/training systems utilizing...  ...experience optimizing large-scale ML systems, GPU... 
    Training
    Performance
    Temporary work
    Remote work
    Flexible hours
    Shift work
    Night shift

    Sandisk

    Milpitas, CA
    18 days ago
  • $184k - $356.5k

     ...Software Engineer in Santa Clara to enhance the performance and reliability of large-scale AI infrastructures. The role involves leadership in debugging and optimizing distributed training workloads across NVIDIA’s GPU platforms. Ideal candidates should have extensive... 
    Training
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  •  ...feature engineering, model training, deployment,...  ...explainability, and responsible AI compliance. **...  ...**Proven experience** architecting large-scale ML/AI platforms with attention to performance, scalability, and maintainability...  .... The ideal Principal has a deep technical science... 
    Principal
    Training
    Performance

    Walmart Canada

    Sunnyvale, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal AI Performance Architect - Large-Scale GPU Training. Be the first to apply!