Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

HPC Engineer

$150k - $300k

Institute of Foundation Models

Institute For Foundation Models

This role provides operational coverage during Abu Dhabi overnight hours and serves as a primary point of contact for infrastructure monitoring, incident triage, researcher support, and production operations.

Responsibilities

  • Monitor health, performance, and availability of large-scale GPU clusters.
  • Respond to incidents and perform first-level triage.
  • Support researchers and troubleshoot job failures.
  • Execute operational runbooks and recovery procedures.
  • Validate cluster deployments, upgrades, and maintenance activities.
  • Track infrastructure utilization and operational metrics.
  • Develop automation and monitoring tools.
  • Contribute to documentation and reporting.

Education

Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Information Technology, Electrical Engineering, Mathematics, Physics, or related disciplines.

Experience

  • 2+ years in Linux systems administration, SRE, DevOps, cloud operations, HPC, or infrastructure operations.
  • Strong Linux troubleshooting skills.
  • Experience with scripting using Python or Bash.

Preferred Qualifications

  • Slurm.
  • GPU infrastructure.
  • AWS, Azure, or GCP.
  • Grafana, Prometheus, Datadog, or similar tools.
  • Containers and Kubernetes.
  • AI/ML infrastructure exposure.
  • Research computing environments.

Salary Range

$150,000 - $300,000 a year

Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the HPC Engineer in Sunnyvale, CA vacancy
  • $148.7k - $201.2k

     ...performance across our product line. The Nitro Team is looking for engineers with systems knowledge and experience in area such as Linux OS...  ...performance computing workloads. The Nitro High Memory and HPC team owns the purpose built platform development for the High performance... 
    Suggested
    Internship
    Local area
    Flexible hours

    Amazon

    Santa Clara, CA
    4 days ago
  • $165k - $220k

     ...CX organization aligns closely with the internal and customer engineering teams, offering valuable insights from the field and having the...  ...focusing on AI/ML workloads within high-performance compute (HPC) environments Collaborate closely with customers to understand... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    12 days ago
  • $152k - $241.5k

     ...environment remains resilient, measurable, and aligned with long-term engineering demands. What you'll be doing: Manage, scale, and...  ...supporting and tuning job scheduling systems (LSF, Slurm, etc.) in HPC or silicon design environments Proficiency in Linux systems... 
    Suggested

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $90k - $110k

     ...CRWV) in March 2025. Learn more at About the Role At CoreWeave we are seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC Networking at CoreWeave is tasked with developing and operating some of the largest InfiniBand... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    12 days ago
  • $152k - $241.5k

     ...Come join the team and see how you can make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA (Electronic Design Automation) and high-performance computing... 
    Suggested

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $140k - $160k

     ...ASRC Federal is looking for a Senior HPC Engineer, as ASRC Federal InuTeq provides High Performance Computing services across the full HPC lifecycle including computational requirements, architecture, acquisition, and operations for federal government customers, while... 
    Contract work
    Weekend work

    ASRC Federal Holding Company

    Mountain View, CA
    2 days ago
  • $184k - $287.5k

     ...implement scalable, next-gen distributed storage services for HPC workloads, optimizing both performance and cost-effectiveness to...  ...to see: ~ Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience. ~8+ years of experience... 

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...NVIDIA Math Libraries team is looking for a senior engineer to join our development efforts in the area of kernel generation for AI and HPC, specifically targeting matrix operations, JITing and fusions. Around the world, leading commercial and academic organizations are... 
    Remote work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $114.8k - $195.2k

     ...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the...  ...moment with us. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component... 
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    1 day ago
  • $109k - $204k

     ...HPC Engineer New York, NY/ Bellevue, WA/ Sunnyvale, CA / Livingston, NJ CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Worldwide
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $154.9k - $263.3k

     ...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the...  ...the digital age. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component... 
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    1 day ago
  • $165k - $242k

     ...HPC Performance Engineer CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and... 
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    2 days ago
  • $162.8k - $217.6k

     ...and applications as well as computing infrastructure to enable engineers to solve problems faster and more efficiently. Promote the use of...  ...engineering teams to efficiently utilize High-Performance Computing (HPC) resources, and make informed decisions on infrastructure... 
    Local area

    Archer

    San Jose, CA
    18 hours ago
  • $175k - $230k

     ...AI/HPC System Engineer Job Title: AI/HPC System Engineer Office Location: San Jose, CA Job Type: Full-Time Work Model: Onsite About SK hynix America: At SK hynix America, we're at the forefront of semiconductor innovation, developing advanced memory solutions... 
    Full time
    Work at office
    Local area

    SK hynix America Inc.

    San Jose, CA
    2 days ago
  • $208k - $253k

     ...Hardware Production / Sustaining Engineer Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only...  ...with cutting-edge GPU architectures and how to leverage them in AI/HPC environments. Expertise supporting or designing systems... 
    Temporary work

    Crusoe

    Sunnyvale, CA
    2 days ago
  • $184k - $287.5k

     ...production in the field? We are looking for a compute and networking savvy Solution Architect to join the NVIDIA Solution Architecture Engineering (SA) team focused on supporting accelerated computing applications. As part of the NVIDIA SA organization, you will be driving... 
    Remote work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $153k - $242k

     ...Senior Systems Engineer, OS Automation CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a...  ...Kubeflow or MLFlow . Background in High-Performance Computing (HPC). Experience fine-tuning small language models (SLMs) for... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Local area
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    4 days ago
  •  ...Performance Computing. The ideal candidate will have strong knowledge of both Linux and Windows operating systems, along with hands-on HPC hardware experience. They will be responsible for the configuration and maintenance of HPC systems and should be comfortable... 

    Zealogics.com

    San Jose, CA
    4 days ago
  • $248k - $396.75k

     ...exciting endeavor! We are seeking a highly skilled Principal AI/ML Engineer to join our dynamic team to build the next generation of IT...  ...fabrics, including AI/ML infrastructure, GPU cluster networking, and HPC environments. Cloud and hybrid networking expertise across... 

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...Embedded Test Engineer Location: Mountain View, 94043 CA (Onsite) Duration: Long term Contract Job Description: ~ Develop and maintain automated unit test frameworks for HPC-based automotive platforms. ~ Work closely with software developers to ensure test... 
    Long term contract

    Kasmo Global

    Mountain View, CA
    2 days ago
  • $170k - $260k

     ...established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape...  ...~ Knowledge of performance profiling and optimization tools for HPC and deep learning. ~ Familiarity with resource management and... 
    Work at office

    GenBio AI

    Palo Alto, CA
    4 days ago
  • $165k - $242k

     ...Systems Engineer, Kernel Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built...  ...QEMU, vFIO) Container runtimes (containerd, nydus, kubelet) HPC/AI workloads (CUDA, GPUDirect, RoCE/InfiniBand) Kernel-... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    2 days ago
  •  ...file systems (Lustre/NFS), virtualization and containerization related experience is a plus. Configuration and maintenance of the HPC computer rack/hardware. Professionally resolve hardware issues. HPC Rack, Build, cable, configure, and provision Linux kernel, Windows... 

    Zealogics

    San Jose, CA
    1 day ago
  • $200k - $400k

     ...Institute Of Foundation Models Engineer The Institute of Foundation Models (IFM) designs and operates ultra-scale GPU supercomputing...  ...GitHub (required) · Provide links to relevant distributed systems, HPC, or large-scale training projects · Include a list of... 
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    1 day ago
  • $189k - $210k

     ...progress. The company invented the world’s first 3D-stacked photonics engine, Passage™, capable of connecting thousands to millions of...  ...light in extreme-scale data centers for the most advanced AI and HPC workloads. Lightmatter raised $400 million in its Series D round... 
    Full time
    Temporary work
    Flexible hours

    Lightmatter

    Mountain View, CA
    18 hours ago
  •  ...HPC Storage Performance Engineer This role has been designated as ‘Remote/Teleworker’, which means you will primarily work from home. Who We Are: Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies... 
    Temporary work
    Work experience placement
    Remote work
    Work from home

    HPE

    San Jose, CA
    4 days ago
  •  ...Job ID: 21-13197 Job Description: Design and implementation of high-performance compute clusters Solid knowledge on the HPC cluster systems, including scalable/robust storage, high-bandwidth inter-connects, CPU / GPU architecture, and a knowledge of cloud-based computing... 

    Intelliswift

    Milpitas, CA
    2 days ago
  • $181k - $297k

     ...days, as determined by the business needs of the team. This role will be based in Mountain View, CA. We are seeking an HPC Network Engineer to design, deploy, and operate high-performance, low-latency Ethernet fabrics for large-scale GPU clusters. The role focuses... 
    For contractors
    Work at office
    Flexible hours

    LinkedIn

    Mountain View, CA
    18 hours ago
  • $160k - $185k

    A leading technology company based in San Jose is seeking a Staff System Engineer responsible for deploying and maintaining critical applications and services. The ideal candidate will have over 12 years of experience in Linux and networking environments, alongside strong... 

    Support Revolution

    San Jose, CA
    4 days ago
  • $140k - $158k

    A leading technology company in San Jose is seeking a Sr. System Engineer to roll out and maintain business-critical applications and services. The role requires expertise in HPC/AI and offers a competitive salary range of $140,000 - $158,000. Candidates should have a degree... 

    Victrays

    San Jose, CA
    7 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to HPC Engineer. Be the first to apply!