Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal AI Performance Architect for Scalable GPU Training

Advanced Micro Devices

Advanced Micro Devices is looking for a Principal Engineer in Santa Clara, CA to lead AI infrastructure development, define GPU architecture specifications, and drive performance gains in ML systems. The role involves leading innovative techniques, collaborating with stakeholders, and establishing best practices for distributed ML systems. The ideal candidate has extensive experience in GPU architectures, CUDA programming, and optimizing large-scale ML systems. A Bachelor's, MS or PhD in Computer Science or Engineering is required. #J-18808-Ljbffr Advanced Micro Devices

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Principal AI Performance Architect for Scalable GPU Training in Santa Clara, CA vacancy
  • $272k - $431.25k

    We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our...  ...potent, effective, and scalable solutions as we mold...  ...Monitor and optimize the performance of our infrastructure ensuring...  ...distributed training operations using PyTorch... 
    Principal
    Training
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $272k - $431.25k

     ...group is solving some of AI’s hardest...  ...interconnects. This Principal Architect role leads the research...  ...communication systems—GPU-to-GPU, GPU-to-storage...  ...deep expertise in high-performance networking (InfiniBand...  ...parallelism, or distributed training and inference patterns... 
    Principal
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...experiences—from AI and data centers,...  ...career. THE ROLE As a Principal Engineer, you will...  ...by defining GPU architecture specifications...  ...massive model training at scale. Your...  ...expertise will drive 2-3x performance gains in both...  ...dimensions Architect memory‑efficient training... 
    Principal
    Training
    Performance
    Remote work

    Advanced Micro Devices

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...Responsibilities Develop innovative high-performance processor and system architectures, focusing on the memory system and energy efficiency...  ...micro‑architecture features to improve the state‑of‑the‑art in GPU memory systems, optimizing along the axes of perf/W, perf/mm,... 
    Principal
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • Graphcore in Milpitas is looking for a highly accomplished GPU Architect to lead the design of AI accelerators and multi-GPU clusters. The role involves...  ...across hardware and software teams to ensure optimal performance and reliability of GPU infrastructures as AI demands... 
    Performance

    Graphcore

    Milpitas, CA
    2 days ago
  • NVIDIA Gruppe in Santa Clara is seeking a Deep Learning Communication Architect to optimize DNN models and enhance communication performance during distributed training. This role requires collaboration with hardware/software teams to implement efficient communication... 
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    15 hours ago
  • $208k - $327.75k

     ...accelerated computing, AI, and autonomous...  ...-of-the-art AI, high-performance compute, and scalable software-defined architectures...  ...for a Senior AI Architect to help define the next...  ...SoCs, including GPU, CPU, DLA, memory hierarchy...  ...of distributed training systems, scaling laws... 
    Training
    Performance
    Worldwide

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  •  ...experiences—from AI and data...  ...THE ROLE: As a Principal AI Infrastructure...  ...large‑scale LLM training and inference on...  ...strong expertise in GPU‑accelerated...  ..., high‑performance AI workloads at...  ...Kubernetes and SLURM. Architect and validate...  ...where applicable, scalable checkpointing)... 
    Principal
    Training
    Performance

    Advanced Micro Devices

    Santa Clara, CA
    1 day ago
  • $272k - $431.25k

     ...Always‑On, low‑overhead GPU profiling service that...  ...interfaces, data flows, and scalability guarantees for multi‑...  .../platform layers, and performance counter/trace providers...  ...with existing ML/AI workflows (e.g., PyTorch...  ...on experience tuning ML training/inference loops based on... 
    Principal
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    15 hours ago
  •  ...world's largest AI chip, 56 times...  ...industry-leading training and inference...  ...times faster than GPU‑based...  ...We're hiring a Principal Engineer for our...  ...workloads, high‑QPS performance, and the operating...  ...& Performance. Architect active‑active...  ...requirements into scalable system designs... 
    Principal
    Training
    Performance

    Cerebras Systems, Inc.

    Sunnyvale, CA
    3 days ago
  • $180k - $200k

     ...the World’s leading AI-first Quality...  ...looking for a Gen AI Architect to join our growing...  ...solutions are scalable, secure, compliant...  ...LangGraph) that meet performance, scalability, and...  ...unstructured) for training, fine-tuning, and...  ...Optimization - Implement GPU optimization,... 
    Training
    Performance
    Full time
    Casual work
    Local area
    Flexible hours

    QualityAI

    Santa Clara, CA
    4 days ago
  • $224k - $356.5k

     ...artificial intelligence (AI) / deep learning (DL), high-performance computing (HPC),...  ...socket CPU and CPU/GPU systems. Work...  ...CPU and interconnect architects to improve future...  ...enabling faster AI model training, agentic use-cases,...  ...processing, and scalable cloud deployments.... 
    Training
    Performance

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $296.3k

     ...Role: We are seeking a Principal AI Engineer to lead the...  ...powers large-scale training and cloud inference....  .... What You’ll Do: Architect, build, and optimize...  ...for reliability, scalability, and performance across the AI/ML platform...  ...systems, GPU computing, and cloud... 
    Principal
    Training
    Performance
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    4 days ago
  •  ...generation computing experiences—from AI and data centers, to PCs, gaming and...  ...r e A r c h i t e c t THE ROLE: As GPU Software Architect, you will provide technical...  ...architectural intent translates into working, performant, and scalable solutions for partnerships... 
    Principal
    Performance
    Remote work

    Advanced Micro Devices

    Santa Clara, CA
    15 hours ago
  • $192k - $267k

    Principal Architect, AI and Semiconductors, Google Cloud Google San Francisco, CA, USA ; Sunnyvale...  ...developed for security, reliability, and scalability, running the full stack from...  ...experience, and relevant education or training. US: $192000 - $267000 (USD) + 42.86... 
    Principal
    Training

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $190k - $280k

     ...experience, education, and training. We also offer incentive opportunities...  ...on individual and company performance. This is in addition to our...  ...the potential of generative AI to power the transformation...  ...per week Hybrid. The role: Principal Architect- Performance Analysis and... 
    Principal
    Training
    Performance
    Full time
    3 days per week

    MixMode

    Santa Clara, CA
    4 days ago
  • $209.5k - $299.2k

     ...evolving markets like AI, cloud, networking, and...  ...the Role The DFT Architect at Altera is a senior...  ...of innovation in high‑performance compute, AI acceleration...  ...generations. You will architect scalable, robust, and forward‑...  ..., experience, and training. Incentive... 
    Principal
    Training
    Performance
    Local area
    Shift work

    191 Altera Corporation

    San Jose, CA
    4 days ago
  • NVIDIA Gruppe in Santa Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting...  ...edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years of... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    15 hours ago
  • $296.3k

     ...team in Embodied AI and is responsible...  ...datasets for model training and evaluation, now...  ...reliability, and scalability of next‑generation...  ...vehicles. Role As a Principal Engineer in the...  ...visualize AV model performance. As a full‑stack...  ...on modern cloud / GPU infrastructure, with... 
    Principal
    Training
    Performance
    Local area
    Flexible hours

    Dormont Manufacturing Co

    Sunnyvale, CA
    2 days ago
  • $272k - $431.25k

    NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California...  ...the efficiency of AI/ML research on GPU Clusters. The role involves collaboration...  ...teams, monitoring infrastructure performance, and implementing improvements based on... 
    Principal
    Performance

    Jobleads-US

    Santa Clara, CA
    15 hours ago
  • Overview We are now looking for a Senior GPU & Deep Learning Architect to join the NVIDIA GPU Architecture...  ...architectures targeting both training and inference workloads. Advance the...  ...validate, and verify functional or performance models. Develop tests, test plans, and... 
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    15 hours ago
  •  ...the next generation of AI breakthroughs and power...  ...engineers and systems architects, Graphcore enjoys a culture...  ...experienced GPU Architect to define the...  ...trillion‑parameter LLM training and high‑throughput localized...  ...reliability, and interconnect performance strategies that ensure... 
    Training
    Performance

    Graphcore

    Milpitas, CA
    2 days ago
  • $172k - $349k

    ## Principal Software Engineer, Systems/Solutions...  ...reliability, scalability, resiliency, and performance across highly complex...  ...across product lines.* Architect scalable, reusable,...  ...adoption of AI-assisted testing workflows...  ..., education/training, and/or skill level... 
    Principal
    Training
    Performance
    Work experience placement
    Work at office
    2 days per week

    Hewlett Packard Enterprise Development LP

    Sunnyvale, CA
    4 days ago
  • $160k - $225k

     ...seeking a Senior Software Engineer for AI Runtime in Mountain View, California. This...  ...and scaling systems for large-scale GPU training, contributing to the architecture of a managed...  ...$225,000, with additional benefits and performance bonuses. #J-18808-Ljbffr United States... 
    Training
    Performance

    United States Digital Space LLC

    Mountain View, CA
    1 day ago
  • $128k - $312k

     ...Expect The Tesla AI Hardware team is at...  ...built to efficiently train massive neural...  ...computational efficiency and performance. By creating...  ...the AI/ML Compute Architect will drive the...  ...create efficient, scalable solutions that power...  ...knowledge of CPU, GPU, and ML... 
    Training
    Performance
    Hourly pay
    Temporary work
    Work at office
    Flexible hours

    Tesla

    Palo Alto, CA
    15 hours ago
  • $143k - $286k

     ...ML, causal inference, and Gen‑AI. Desirable Gen‑AI skills: embedding...  ..., MapReduce, HQL, Scala), and GPU/CUDA for computational...  ...000‑$264,000 in Hoboken) plus performance‑based bonuses. ~401(k) match...  ...reimbursement, and access to internal training. ~ Special considerations... 
    Principal
    Training
    Performance
    Temporary work
    Flexible hours

    Walmart

    Sunnyvale, CA
    4 days ago
  • NVIDIA Corporation is searching for a Senior GPU Architect in Santa Clara, CA to innovate and contribute to the design of our proprietary...  ...utilizing hardware modeling and verification to enhance GPU performance insights. Prospective candidates should possess a Master's or... 
    Performance
    Remote job

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • d-Matrix is seeking a Principal Compute Design Engineer to be responsible for...  ...micro-architecture and design of AI sub-system modules. You will collaborate with System Architects to develop efficient solutions, ensuring high performance and efficiency in RTL design. The... 
    Principal
    Performance
    3 days per week

    MixMode

    Santa Clara, CA
    2 days ago
  •  ...NVIDIA Gruppe seeks a Principal Architect to drive the architectural vision for AI communication systems. This role involves setting the technical direction,...  ...networking and systems software, particularly in high-performance environments, as well as a degree in Computer... 
    Principal
    Performance

    Jobleads-US

    Santa Clara, CA
    15 hours ago
  • $240k - $379.5k

     ...customers where they are on their AI journey on our GPUs - this...  ...Product Manager for AI Platform training you will be responsible for...  ...builders to get the best large scale performance, resilience and experience on...  ...deep learning across all GPU use cases and providing great... 
    Principal
    Training
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    15 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal AI Performance Architect for Scalable GPU Training. Be the first to apply!