Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

HPC Engineer

$150k - $300k

Institute of Foundation Models

About MBZUAI
The Institute for Foundation Models (IFM) operates some of the world's largest AI supercomputing environments.

Position Summary
This role provides operational coverage during Abu Dhabi overnight hours and serves as a primary point of contact for infrastructure monitoring, incident triage, researcher support, and production operations.

Responsibilities

• Monitor health, performance, and availability of large-scale GPU clusters.
• Respond to incidents and perform first-level triage.
• Support researchers and troubleshoot job failures.
• Execute operational runbooks and recovery procedures.
• Validate cluster deployments, upgrades, and maintenance activities.
• Track infrastructure utilization and operational metrics.
• Develop automation and monitoring tools.
• Contribute to documentation and reporting.

Education

Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Information Technology, Electrical Engineering, Mathematics, Physics, or related disciplines.

Experience

• 2+ years in Linux systems administration, SRE, DevOps, cloud operations, HPC, or infrastructure operations.
• Strong Linux troubleshooting skills.
• Experience with scripting using Python or Bash.

Preferred Qualifications

• Slurm.
• GPU infrastructure.
• AWS, Azure, or GCP.
• Grafana, Prometheus, Datadog, or similar tools.
• Containers and Kubernetes.
• AI/ML infrastructure exposure.
• Research computing environments.

$150,000 - $300,000 a year

Salary Range

The posted salary range represents the company's good faith estimate of the compensation for this position upon hire. The actual compensation offered may vary within this range depending on individual qualifications, including but not limited to relevant skills, experience, education, certifications, geographic location, and specific business needs.


Benefits Include

*Comprehensive medical, dental, and vision benefits

*Bonus

*401K Plan

*Generous paid time off, sick leave and holidays

*Paid Parental Leave

*Employee Assistance Program

*Life insurance and disability
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the HPC Engineer in Sunnyvale, CA vacancy
  • $165k - $220k

     ...CX organization aligns closely with the internal and customer engineering teams, offering valuable insights from the field and having the...  ...focusing on AI/ML workloads within high-performance compute (HPC) environments Collaborate closely with customers to understand... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    14 days ago
  • $148.7k - $201.2k

     ...performance across our product line. The Nitro Team is looking for engineers with systems knowledge and experience in area such as Linux OS...  ...performance computing workloads. The Nitro High Memory and HPC team owns the purpose built platform development for the High performance... 
    Suggested
    Internship
    Local area
    Flexible hours

    Amazon

    Santa Clara, CA
    1 day ago
  • $124k - $195.5k

    NVIDIA Gruppe in Santa Clara seeks an HPC Operations Engineer to design and implement compute clusters for silicon development. Ideal candidates will have experience troubleshooting in large-scale environments and enhancing deployment automation. Applicants should be proficient... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $90k - $110k

     ...CRWV) in March 2025. Learn more at About the Role At CoreWeave we are seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC Networking at CoreWeave is tasked with developing and operating some of the largest InfiniBand... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    14 days ago
  • $152k - $241.5k

     ...environment remains resilient, measurable, and aligned with long-term engineering demands. What you'll be doing: Manage, scale, and...  ...supporting and tuning job scheduling systems (LSF, Slurm, etc.) in HPC or silicon design environments Proficiency in Linux systems... 
    Suggested

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

    NVIDIA Gruppe in Santa Clara is seeking a Senior Software Engineer to enhance their HPC infrastructure. The role involves applying distributed systems patterns, automation, and building scalable services in a hybrid multi-cloud environment. Candidates should have strong... 

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $124k - $195.5k

     ...clusters at high reliability, efficiency, and performance, and drive foundational improvements and automation to improve engineers’ productivity. As an HPC Operations Engineer, you are responsible for the big picture of how our systems relate to each other, using a breadth... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $140k - $160k

     ...ASRC Federal is looking for a Senior HPC Engineer, as ASRC Federal InuTeq provides High Performance Computing services across the full HPC lifecycle including computational requirements, architecture, acquisition, and operations for federal government customers, while... 
    Contract work
    Weekend work

    ASRC Federal Holding Company

    Mountain View, CA
    4 days ago
  • $152k - $241.5k

     ...Come join the team and see how you can make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA (Electronic Design Automation) and high-performance computing... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...and continuous improvement, ensuring they integrate cleanly with HPC schedulers, storage, and network fabrics. Use IaC (...  ...programming languages such as Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through design reviews, architecture... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $159.5k - $271.2k

     ...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the...  ...Strong experience designing and deploying storage solutions in HPC or high-performance environments Strong understanding of storage... 
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    2 days ago
  • NVIDIA Gruppe is looking for a senior engineer to join their Math Libraries team in Santa Clara, California. This role involves designing...  ...generation. The ideal candidate has over 8 years of experience in HPC software development using C++, along with leadership skills and... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $184k - $287.5k

     ...NVIDIA Math Libraries team is looking for a senior engineer to join our development efforts in the area of kernel generation for AI and HPC, specifically targeting matrix operations, JITing and fusions. Around the world, leading commercial and academic organizations are... 
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...implement scalable, next-gen distributed storage services for HPC workloads, optimizing both performance and cost-effectiveness to...  ...to see: ~ Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience. ~8+ years of experience... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  • NVIDIA is searching for a highly skilled HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for Electronic Design Automation and high-performance computing workloads across multiple teams and projects. The role collaborates with researchers and infrastructure... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $154.9k - $263.3k

     ...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the...  ...the digital age. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component... 
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    3 days ago
  • $114.8k - $195.2k

     ...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the...  ...moment with us. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component... 
    Minimum wage
    Work experience placement
    Flexible hours

    KLA

    Milpitas, CA
    3 days ago
  • $109k - $204k

     ...HPC Engineer New York, NY/ Bellevue, WA/ Sunnyvale, CA / Livingston, NJ CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Worldwide
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    4 days ago
  • $165k - $242k

     ...HPC Performance Engineer CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and... 
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    4 days ago
  • NVIDIA Gruppe in Santa Clara, California is seeking a skilled HPC/AI Benchmarking and Telemetry Engineer to join their team. In this role, you will develop benchmarking approaches for large-scale HPC and AI clusters, create telemetry frameworks to capture performance data... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • Schlumberger is seeking a High Performance Computing (HPC) Engineer in Sunnyvale, CA, to tackle complex discrete optimization problems. The ideal candidate will have a strong background in operations research and advanced mathematical programming, along with hands-on experience... 

    Schlumberger

    Sunnyvale, CA
    2 days ago
  • A leading cloud technology company is seeking a highly skilled HPC Performance Engineer to join their HAVOCK Team in Sunnyvale, California. In this role, you will optimize bare-metal systems and ensure the performance of complex workloads using various technologies including... 

    CoreWeave

    Sunnyvale, CA
    1 day ago
  • $155k - $185k

     ..., Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing...  ...community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us. Job Summary:... 
    Contract work
    Immediate start
    Worldwide

    Super Micro Computer

    San Jose, CA
    7 days ago
  • $162.8k - $217.6k

     ...and applications as well as computing infrastructure to enable engineers to solve problems faster and more efficiently. Promote the use of...  ...engineering teams to efficiently utilize High-Performance Computing (HPC) resources, and make informed decisions on infrastructure... 
    Local area

    Archer

    San Jose, CA
    2 days ago
  • $152k - $287.5k

    NVIDIA Gruppe is seeking a Partner Enablement Engineer in Santa Clara, California. This role offers an opportunity to support our partners...  ...with advanced networking solutions for Deep Learning and HPC applications using NCCL. The ideal candidate will possess a strong... 

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $175k - $230k

     ...AI/HPC System Engineer Job Title: AI/HPC System Engineer Office Location: San Jose, CA Job Type: Full-Time Work Model: Onsite About SK hynix America: At SK hynix America, we're at the forefront of semiconductor innovation, developing advanced memory solutions... 
    Full time
    Work at office
    Local area

    SK hynix America Inc.

    San Jose, CA
    3 days ago
  • KLA, located in Milpitas, California, is seeking an HPC Systems Architect to design and support HPC clusters vital for IC fabs and mask shops globally. The ideal candidate will have deep expertise in Linux systems and virtualization technologies, and will drive innovative... 

    KLA

    Milpitas, CA
    5 days ago
  • $154.9k - $263.3k

    KLA in Milpitas is seeking an expert in High-Performance Computing (HPC) to design and support HPC clusters essential for semiconductor manufacturing. This role requires strong Linux systems knowledge, experience in virtualization technology, and an understanding of HPC... 
    Work experience placement

    KLA

    Milpitas, CA
    5 days ago
  • $208k - $253k

     ...Hardware Production / Sustaining Engineer Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only...  ...with cutting-edge GPU architectures and how to leverage them in AI/HPC environments. Expertise supporting or designing systems... 
    Temporary work

    Crusoe

    Sunnyvale, CA
    3 days ago
  • $184k - $287.5k

     ...production in the field? We are looking for a compute and networking savvy Solution Architect to join the NVIDIA Solution Architecture Engineering (SA) team focused on supporting accelerated computing applications. As part of the NVIDIA SA organization, you will be driving... 
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to HPC Engineer. Be the first to apply!