Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

HPC Software Engineer for 1000+ GPU AI Training

Institute of Foundation Models

A leading AI research laboratory in California is seeking a High Performance Computing Software Engineer to design and develop robust software systems for large-scale AI workloads. The ideal candidate has experience optimizing software for AI training on 1000+ GPUs and a deep understanding of Linux internals. This role involves collaboration with researchers and engineers to enhance HPC capabilities, tackle complex challenges, and drive innovation in AI solutions. Competitive salary and comprehensive benefits are offered. #J-18808-Ljbffr Institute of Foundation Models

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the HPC Software Engineer for 1000+ GPU AI Training in Sunnyvale, CA vacancy
  • $109k - $160k

     ...GPU Infrastructure Software Engineer Sunnyvale, CA CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers...  ...testing hardware at scale. HPC Experience. Experience...  ...with AI/ML infrastructure and training / inference. The base... 
    Training
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    1 day ago
  • $272k - $431.25k

     ...Principal Ai And Ml Infra Software Engineer, Gpu Clusters We are seeking a Principal AI and ML Infra Software...  ...demonstrated expertise in AI/ML and HPC tasks and systems. ~ Hands-on...  ...and improving substantial distributed training operations using PyTorch (DDP, FSDP)... 
    Training

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $109k - $160k

     ...The Essential Cloud for AI™. Built for pioneers by...  ...About the role A Software Engineer contributes to the design...  ...teams to evolve our GPU performance testing platform...  ...hardware at scale. HPC Experience....  .../ML infrastructure and training / inference. The base... 
    Training
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    6 days ago
  • $152k - $241.5k

     ...highly motivated, creative engineers to join the Platform Software team. You will work with a...  ...excellence: debug and root-cause GPU bottlenecks and issues for gaming, creator, and AI workload, validate BSP...  ...expertise across GPU SW stack, LLM training and inference, and Arm... 
    Training

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $181k - $297k

     ..., CA. We are seeking an HPC Network Engineer to design, deploy, and operate...  ...fabrics for large-scale GPU clusters. The role focuses...  ...interconnect networks supporting AI/ML training, inference, and HPC...  ...systems, GPU, platform, and software teams to build scalable, lossless... 
    Training
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Mountain View, CA
    4 days ago
  • $150k - $300k

     ...next generation of AI builders, and...  ...foundation model training, alongside world...  ...scientists, and engineers, tackling the...  ...Performance Computing Software Engineer to help...  ...models—spanning 1000+ GPUs—and...  ...including Linux kernel, GPU/accelerator...  ...knowledge of HPC job scheduling and... 
    Training

    Institute of Foundation Models

    Sunnyvale, CA
    2 days ago
  • $149.6k - $211.2k

     ...Develops and/or validates software that enables Intel GPUs...  ...Science, Computer Engineering, Mathematics or related...  ...project. Experience with AI/Machine Learning tools...  ...to enable the AI PC and GPU IP to support all of Intel...  ...relevant education or training. Your recruiter can... 
    Training
    Internship
    Immediate start
    Shift work

    Intel Corporation

    Santa Clara, CA
    4 days ago
  • $207k - $300k

    Software Engineer, GDC LLM Serving and GPU Performance Google Sunnyvale, CA, USA Qualifications Bachelor’s degree...  ...Language Models? Join the GDC AI Models and Performance team and work...  ...experience, and relevant education or training. Your recruiter can share more... 
    Training
    Full time

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $262k - $365k

    Senior Staff Software Architect, GPU Uber Tech Leads corporate_fare...  ...Science, Electrical Engineering, a related technical...  ...Performance Computing (HPC) systems and...  ...that power Google’s AI and HPC infrastructure...  ...relevant education or training. Your recruiter can share... 
    Training
    Full time

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $181.1k - $318.4k

     ...GPU Software Architecture Engineer, Graphics, Games, & ML Apple Silicon GPU SW architecture team within the...  ...as you help define the future of AI experiences delivered through Apple's...  ...InfiniBand, RDMA, NCCL) in the context of ML training/inference ~ Must have excellent... 
    Training
    Relocation

    Apple

    Cupertino, CA
    7 hours ago
  • $272k - $431.25k

    NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California, to enhance the efficiency of AI/ML research on GPU Clusters. The role involves collaboration...  ...should have extensive experience in HPC systems, programming, and a strong educational... 

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $152k - $241.5k

     ...tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers,...  .... We are looking for a Senior Software Engineer to join our mission to continue improving our HPC infrastructure. Our team builds... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...Visualization. The GPU, our invention, serves...  ...Deep Learning engineer to bring advanced CUDA...  ...technologies into AI stacks, including PyTorch...  ...Deep Learning and HPC applications. Your...  ..., ranging from training on scales up to 100...  ...principles (aka systems software fundamentals) ~... 
    Training

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

     ...NVIDIA seeks a senior software engineer to join the AI Networking co-design and benchmark...  ...AI workloads across large GPU and CPU clusters, thereby...  ...Learning, particularly within LLM training and inference stacks. A...  ...of the following areas: HPC, networking, and AI applications... 
    Training

    NVIDIA

    Santa Clara, CA
    7 hours ago
  •  ...Software Engineer III Be an integral part of an agile team that's constantly pushing the envelope...  ...and batching. Deploy and manage GPU workloads in Kubernetes environments....  ...capabilities, and skills Formal training or certification on software engineering... 
    Training

    Chase

    Palo Alto, CA
    1 day ago
  •  ...generation computing experiences-from AI and data centers, to PCs,...  ...looking for an influential software engineer who is passionate about...  ...challenges in the industry: training and running AI to make AI itself...  ...performance from the lowest-level GPU kernels to large-scale... 
    Training

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    7 hours ago
  •  ...accelerate next‑generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems....  ...shape the future of AI and beyond. Principal / Senior GPU Software Performance Engineer — Post‑Training THE ROLE: Drive the performance of post‑training... 
    Training

    Advanced Micro Devices

    San Jose, CA
    5 days ago
  • $262k - $365k

    Google Inc. is seeking a Senior Staff Software Architect to lead the development of innovative software technologies for AI and HPC infrastructure. The ideal candidate will possess...  ...degree in Computer Science or Electrical Engineering and have 8 years of experience in... 

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $122.44k - $232.19k

    The Role and Impact: As a GPU Logic Design Engineer at Intel, you will play a critical role in shaping...  ...‑purpose compute, web services, HPC, and AI‑accelerated systems. Our charter encompasses...  ..., and relevant education or training. Your recruiter can share more about... 
    Training
    Local area
    Immediate start
    Worldwide
    Flexible hours
    Shift work

    Intel Corporation

    Santa Clara, CA
    5 days ago
  • $207k - $300k

     ...years of experience in software development. 5 years of...  ...Experience with modern GPU architectures (NVIDIA, AMD, or other AI accelerators), memory hierarchies...  ..., etc.) and performance engineering techniques. Preferred...  ...relevant education or training. Your recruiter can... 
    Training
    Full time
    Temporary work
    Worldwide

    Google

    Sunnyvale, CA
    1 day ago
  • $131k - $226k

    Job title Senior Software Engineer - Velox Operators for GPU Location San Jose, California, United States Salary Range USD 131...  ...employees where required by applicable law Training and educational resources on our personalized, AI‑driven learning platform where IBMers can... 
    Training
    Full time
    Temporary work

    IBM

    San Jose, CA
    1 day ago
  • $152k - $241.5k

     ...Performance Computing and Visualization. The GPU, our invention, serves as the visual...  ...are looking for highly motivated Senior Software Engineers to work on our GPU Fabric Networking...  ...existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA... 
    Remote work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • HPC & AI Senior Performance EngineerThis role has been...  ...AI Senior Performance Engineer**HPE is seeking an experienced...  ...-scale HPC systems and software* Experience with large-...  ...of multiprocessor and GPU hardware architectures,...  ...experience, education/training, and/or skill level. -... 
    Training
    Work experience placement
    Remote work
    Work from home
    Worldwide

    Hewlett Packard Enterprise Development LP

    San Jose, CA
    5 days ago
  •  ...computing experiences-from AI and data centers, to...  ...seeking a Principal Software Quality Engineer to serve as the senior...  ...on AMD Instinct™ GPU platforms. You will set...  ...characterization - LLM training and inference (PyTorch...  ...recommender systems, scientific HPC kernels, MLPerf-class... 
    Training
    Contract work
    Shift work

    Advanced Micro Devices , Inc.

    San Jose, CA
    3 days ago
  •  ...computing experiences-from AI and data centers, to...  ...career. SENIOR GPU FIRMWARE ENGINEER Firmware Application...  ...across Cloud, HPC, and OEM segments. You...  ...collaborating across software stacks to deliver optimized...  ..., guidelines, and training materials.... 

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    4 days ago
  •  ...builds the world's largest AI chip, 56 times larger...  ...deliver industry-leading training and inference speeds and...  ...10 times faster than GPU-based hyperscale cloud inference...  ...complex hardware/software systems and driving issues...  ...-performance clusters, HPC systems, or custom hardware... 
    Training

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    4 days ago
  • $152k - $241.5k

     ...and Visualization. The GPU, our invention, serves as...  ...UCX for Deep Learning and HPC. We are looking for a motivated Performance engineer to influence the roadmap...  ...principles (aka systems software fundamentals) ~...  ...vacancy. NVIDIA uses AI tools in its recruiting... 
    Remote work

    NVIDIA

    Santa Clara, CA
    7 hours ago
  • $152k - $241.5k

     ...and Visualization. The GPU, our invention, serves...  ...highly motivated Senior Software Engineers to join our Fabric Networking...  ..., and large-scale AI infrastructure,...  ...CUDA, and large-scale AI/HPC clusters such as NVIDIA...  ...interconnects, and distributed training/inference systems.... 
    Training
    Full time

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $140k - $240k

     ...the world's largest AI chip, 56 times...  ...deliver industry-leading training and inference...  ...times faster than GPU-based hyperscale cloud...  ...accelerator systems, 1000's of high-end servers...  ...-first based engineering. Cerebras cluster involves...  ...cluster management software stack - all the way... 
    Training

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    7 hours ago
  • black.ai is looking for a skilled platform engineer in Palo Alto to enhance our AWS infrastructure and support quantum simulations. This role requires strong...  ...in platform engineering, DevOps practices, and GPU workloads. As a platform engineer, you will improve deployment... 

    black.ai

    Palo Alto, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to HPC Software Engineer for 1000+ GPU AI Training. Be the first to apply!