HPC Engineer
$150k - $300kInstitute of Foundation Models
About MBZUAI
The Institute for Foundation Models (IFM) operates some of the world's largest AI supercomputing environments.
This role provides operational coverage during Abu Dhabi overnight hours and serves as a primary point of contact for infrastructure monitoring, incident triage, researcher support, and production operations. Responsibilities • Monitor health, performance, and availability of large-scale GPU clusters.
• Respond to incidents and perform first-level triage.
• Support researchers and troubleshoot job failures.
• Execute operational runbooks and recovery procedures.
• Validate cluster deployments, upgrades, and maintenance activities.
• Track infrastructure utilization and operational metrics.
• Develop automation and monitoring tools.
• Contribute to documentation and reporting. Education Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Information Technology, Electrical Engineering, Mathematics, Physics, or related disciplines. Experience • 2+ years in Linux systems administration, SRE, DevOps, cloud operations, HPC, or infrastructure operations.
• Strong Linux troubleshooting skills.
• Experience with scripting using Python or Bash. Preferred Qualifications • Slurm.
• GPU infrastructure.
• AWS, Azure, or GCP.
• Grafana, Prometheus, Datadog, or similar tools.
• Containers and Kubernetes.
• AI/ML infrastructure exposure.
• Research computing environments. $150,000 - $300,000 a year Salary Range The posted salary range represents the company's good faith estimate of the compensation for this position upon hire. The actual compensation offered may vary within this range depending on individual qualifications, including but not limited to relevant skills, experience, education, certifications, geographic location, and specific business needs.
Benefits Include *Comprehensive medical, dental, and vision benefits *Bonus *401K Plan *Generous paid time off, sick leave and holidays *Paid Parental Leave *Employee Assistance Program *Life insurance and disability
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the HPC Engineer in Sunnyvale, CA vacancy
$165k - $220k
...CX organization aligns closely with the internal and customer engineering teams, offering valuable insights from the field and having the... ...focusing on AI/ML workloads within high-performance compute (HPC) environments Collaborate closely with customers to understand...SuggestedPermanent employmentTemporary workCasual workWork at officeFlexible hours$148.7k - $201.2k
...performance across our product line. The Nitro Team is looking for engineers with systems knowledge and experience in area such as Linux OS... ...performance computing workloads. The Nitro High Memory and HPC team owns the purpose built platform development for the High performance...SuggestedInternshipLocal areaFlexible hours$124k - $195.5k
NVIDIA Gruppe in Santa Clara seeks an HPC Operations Engineer to design and implement compute clusters for silicon development. Ideal candidates will have experience troubleshooting in large-scale environments and enhancing deployment automation. Applicants should be proficient...Suggested$90k - $110k
...CRWV) in March 2025. Learn more at About the Role At CoreWeave we are seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC Networking at CoreWeave is tasked with developing and operating some of the largest InfiniBand...SuggestedPermanent employmentTemporary workCasual workWork at officeFlexible hours$152k - $241.5k
...environment remains resilient, measurable, and aligned with long-term engineering demands. What you'll be doing: Manage, scale, and... ...supporting and tuning job scheduling systems (LSF, Slurm, etc.) in HPC or silicon design environments Proficiency in Linux systems...Suggested$152k - $241.5k
NVIDIA Gruppe in Santa Clara is seeking a Senior Software Engineer to enhance their HPC infrastructure. The role involves applying distributed systems patterns, automation, and building scalable services in a hybrid multi-cloud environment. Candidates should have strong...$124k - $195.5k
...clusters at high reliability, efficiency, and performance, and drive foundational improvements and automation to improve engineers’ productivity. As an HPC Operations Engineer, you are responsible for the big picture of how our systems relate to each other, using a breadth...$140k - $160k
...ASRC Federal is looking for a Senior HPC Engineer, as ASRC Federal InuTeq provides High Performance Computing services across the full HPC lifecycle including computational requirements, architecture, acquisition, and operations for federal government customers, while...Contract workWeekend work$152k - $241.5k
...Come join the team and see how you can make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA (Electronic Design Automation) and high-performance computing...$152k - $241.5k
...and continuous improvement, ensuring they integrate cleanly with HPC schedulers, storage, and network fabrics. Use IaC (... ...programming languages such as Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through design reviews, architecture...$159.5k - $271.2k
...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the... ...Strong experience designing and deploying storage solutions in HPC or high-performance environments Strong understanding of storage...Minimum wageWork experience placementFlexible hours- NVIDIA Gruppe is looking for a senior engineer to join their Math Libraries team in Santa Clara, California. This role involves designing... ...generation. The ideal candidate has over 8 years of experience in HPC software development using C++, along with leadership skills and...
$184k - $287.5k
...NVIDIA Math Libraries team is looking for a senior engineer to join our development efforts in the area of kernel generation for AI and HPC, specifically targeting matrix operations, JITing and fusions. Around the world, leading commercial and academic organizations are...Remote work$184k - $287.5k
...implement scalable, next-gen distributed storage services for HPC workloads, optimizing both performance and cost-effectiveness to... ...to see: ~ Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience. ~8+ years of experience...- NVIDIA is searching for a highly skilled HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for Electronic Design Automation and high-performance computing workloads across multiple teams and projects. The role collaborates with researchers and infrastructure...
$154.9k - $263.3k
...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the... ...the digital age. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component...Minimum wageWork experience placementFlexible hours$114.8k - $195.2k
...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the... ...moment with us. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component...Minimum wageWork experience placementFlexible hours$109k - $204k
...HPC Engineer New York, NY/ Bellevue, WA/ Sunnyvale, CA / Livingston, NJ CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence...Permanent employmentTemporary workCasual workWork at officeRemote workWorldwideFlexible hours$165k - $242k
...HPC Performance Engineer CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and...Temporary workCasual workWork at officeRemote workFlexible hours- NVIDIA Gruppe in Santa Clara, California is seeking a skilled HPC/AI Benchmarking and Telemetry Engineer to join their team. In this role, you will develop benchmarking approaches for large-scale HPC and AI clusters, create telemetry frameworks to capture performance data...
- Schlumberger is seeking a High Performance Computing (HPC) Engineer in Sunnyvale, CA, to tackle complex discrete optimization problems. The ideal candidate will have a strong background in operations research and advanced mathematical programming, along with hands-on experience...
- A leading cloud technology company is seeking a highly skilled HPC Performance Engineer to join their HAVOCK Team in Sunnyvale, California. In this role, you will optimize bare-metal systems and ensure the performance of complex workloads using various technologies including...
$155k - $185k
..., Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing... ...community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us. Job Summary:...Contract workImmediate startWorldwide$162.8k - $217.6k
...and applications as well as computing infrastructure to enable engineers to solve problems faster and more efficiently. Promote the use of... ...engineering teams to efficiently utilize High-Performance Computing (HPC) resources, and make informed decisions on infrastructure...Local area$152k - $287.5k
NVIDIA Gruppe is seeking a Partner Enablement Engineer in Santa Clara, California. This role offers an opportunity to support our partners... ...with advanced networking solutions for Deep Learning and HPC applications using NCCL. The ideal candidate will possess a strong...$175k - $230k
...AI/HPC System Engineer Job Title: AI/HPC System Engineer Office Location: San Jose, CA Job Type: Full-Time Work Model: Onsite About SK hynix America: At SK hynix America, we're at the forefront of semiconductor innovation, developing advanced memory solutions...Full timeWork at officeLocal area- KLA, located in Milpitas, California, is seeking an HPC Systems Architect to design and support HPC clusters vital for IC fabs and mask shops globally. The ideal candidate will have deep expertise in Linux systems and virtualization technologies, and will drive innovative...
$154.9k - $263.3k
KLA in Milpitas is seeking an expert in High-Performance Computing (HPC) to design and support HPC clusters essential for semiconductor manufacturing. This role requires strong Linux systems knowledge, experience in virtualization technology, and an understanding of HPC...Work experience placement$208k - $253k
...Hardware Production / Sustaining Engineer Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only... ...with cutting-edge GPU architectures and how to leverage them in AI/HPC environments. Expertise supporting or designing systems...Temporary work$184k - $287.5k
...production in the field? We are looking for a compute and networking savvy Solution Architect to join the NVIDIA Solution Architecture Engineering (SA) team focused on supporting accelerated computing applications. As part of the NVIDIA SA organization, you will be driving...Remote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to HPC Engineer. Be the first to apply!


