HPC Engineer
$150k - $300kInstitute of Foundation Models
Institute For Foundation Models
This role provides operational coverage during Abu Dhabi overnight hours and serves as a primary point of contact for infrastructure monitoring, incident triage, researcher support, and production operations.
Responsibilities
- Monitor health, performance, and availability of large-scale GPU clusters.
- Respond to incidents and perform first-level triage.
- Support researchers and troubleshoot job failures.
- Execute operational runbooks and recovery procedures.
- Validate cluster deployments, upgrades, and maintenance activities.
- Track infrastructure utilization and operational metrics.
- Develop automation and monitoring tools.
- Contribute to documentation and reporting.
Education
Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Information Technology, Electrical Engineering, Mathematics, Physics, or related disciplines.
Experience
- 2+ years in Linux systems administration, SRE, DevOps, cloud operations, HPC, or infrastructure operations.
- Strong Linux troubleshooting skills.
- Experience with scripting using Python or Bash.
Preferred Qualifications
- Slurm.
- GPU infrastructure.
- AWS, Azure, or GCP.
- Grafana, Prometheus, Datadog, or similar tools.
- Containers and Kubernetes.
- AI/ML infrastructure exposure.
- Research computing environments.
Salary Range
$150,000 - $300,000 a year
Benefits Include
- Comprehensive medical, dental, and vision benefits
- Bonus
- 401K Plan
- Generous paid time off, sick leave and holidays
- Paid Parental Leave
- Employee Assistance Program
- Life insurance and disability
$148.7k - $201.2k
...performance across our product line. The Nitro Team is looking for engineers with systems knowledge and experience in area such as Linux OS... ...performance computing workloads. The Nitro High Memory and HPC team owns the purpose built platform development for the High performance...SuggestedInternshipLocal areaFlexible hours$165k - $220k
...CX organization aligns closely with the internal and customer engineering teams, offering valuable insights from the field and having the... ...focusing on AI/ML workloads within high-performance compute (HPC) environments Collaborate closely with customers to understand...SuggestedPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$152k - $241.5k
...environment remains resilient, measurable, and aligned with long-term engineering demands. What you'll be doing: Manage, scale, and... ...supporting and tuning job scheduling systems (LSF, Slurm, etc.) in HPC or silicon design environments Proficiency in Linux systems...Suggested$90k - $110k
...CRWV) in March 2025. Learn more at About the Role At CoreWeave we are seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC Networking at CoreWeave is tasked with developing and operating some of the largest InfiniBand...SuggestedPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$152k - $241.5k
...Come join the team and see how you can make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA (Electronic Design Automation) and high-performance computing...Suggested$140k - $160k
...ASRC Federal is looking for a Senior HPC Engineer, as ASRC Federal InuTeq provides High Performance Computing services across the full HPC lifecycle including computational requirements, architecture, acquisition, and operations for federal government customers, while...Contract workWeekend work$184k - $287.5k
...implement scalable, next-gen distributed storage services for HPC workloads, optimizing both performance and cost-effectiveness to... ...to see: ~ Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience. ~8+ years of experience...$184k - $287.5k
...NVIDIA Math Libraries team is looking for a senior engineer to join our development efforts in the area of kernel generation for AI and HPC, specifically targeting matrix operations, JITing and fusions. Around the world, leading commercial and academic organizations are...Remote work$114.8k - $195.2k
...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the... ...moment with us. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component...Minimum wageWork experience placementFlexible hours$109k - $204k
...HPC Engineer New York, NY/ Bellevue, WA/ Sunnyvale, CA / Livingston, NJ CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence...Permanent employmentTemporary workCasual workWork at officeRemote workWorldwideFlexible hours$154.9k - $263.3k
...invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the... ...the digital age. Job Description/Preferred Qualifications HPC server systems are increasingly an essential and enabling component...Minimum wageWork experience placementFlexible hours$165k - $242k
...HPC Performance Engineer CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and...Temporary workCasual workWork at officeRemote workFlexible hours$162.8k - $217.6k
...and applications as well as computing infrastructure to enable engineers to solve problems faster and more efficiently. Promote the use of... ...engineering teams to efficiently utilize High-Performance Computing (HPC) resources, and make informed decisions on infrastructure...Local area$175k - $230k
...AI/HPC System Engineer Job Title: AI/HPC System Engineer Office Location: San Jose, CA Job Type: Full-Time Work Model: Onsite About SK hynix America: At SK hynix America, we're at the forefront of semiconductor innovation, developing advanced memory solutions...Full timeWork at officeLocal area$208k - $253k
...Hardware Production / Sustaining Engineer Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only... ...with cutting-edge GPU architectures and how to leverage them in AI/HPC environments. Expertise supporting or designing systems...Temporary work$184k - $287.5k
...production in the field? We are looking for a compute and networking savvy Solution Architect to join the NVIDIA Solution Architecture Engineering (SA) team focused on supporting accelerated computing applications. As part of the NVIDIA SA organization, you will be driving...Remote work$153k - $242k
...Senior Systems Engineer, OS Automation CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a... ...Kubeflow or MLFlow . Background in High-Performance Computing (HPC). Experience fine-tuning small language models (SLMs) for...Permanent employmentTemporary workCasual workWork at officeLocal areaRemote workFlexible hours- ...Performance Computing. The ideal candidate will have strong knowledge of both Linux and Windows operating systems, along with hands-on HPC hardware experience. They will be responsible for the configuration and maintenance of HPC systems and should be comfortable...
$248k - $396.75k
...exciting endeavor! We are seeking a highly skilled Principal AI/ML Engineer to join our dynamic team to build the next generation of IT... ...fabrics, including AI/ML infrastructure, GPU cluster networking, and HPC environments. Cloud and hybrid networking expertise across...- ...Embedded Test Engineer Location: Mountain View, 94043 CA (Onsite) Duration: Long term Contract Job Description: ~ Develop and maintain automated unit test frameworks for HPC-based automotive platforms. ~ Work closely with software developers to ensure test...Long term contract
$170k - $260k
...established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape... ...~ Knowledge of performance profiling and optimization tools for HPC and deep learning. ~ Familiarity with resource management and...Work at office$165k - $242k
...Systems Engineer, Kernel Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA CoreWeave is The Essential Cloud for AI™. Built... ...QEMU, vFIO) Container runtimes (containerd, nydus, kubelet) HPC/AI workloads (CUDA, GPUDirect, RoCE/InfiniBand) Kernel-...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hours- ...file systems (Lustre/NFS), virtualization and containerization related experience is a plus. Configuration and maintenance of the HPC computer rack/hardware. Professionally resolve hardware issues. HPC Rack, Build, cable, configure, and provision Linux kernel, Windows...
$200k - $400k
...Institute Of Foundation Models Engineer The Institute of Foundation Models (IFM) designs and operates ultra-scale GPU supercomputing... ...GitHub (required) · Provide links to relevant distributed systems, HPC, or large-scale training projects · Include a list of...Visa sponsorship$189k - $210k
...progress. The company invented the world’s first 3D-stacked photonics engine, Passage™, capable of connecting thousands to millions of... ...light in extreme-scale data centers for the most advanced AI and HPC workloads. Lightmatter raised $400 million in its Series D round...Full timeTemporary workFlexible hours- ...HPC Storage Performance Engineer This role has been designated as ‘Remote/Teleworker’, which means you will primarily work from home. Who We Are: Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies...Temporary workWork experience placementRemote workWork from home
- ...Job ID: 21-13197 Job Description: Design and implementation of high-performance compute clusters Solid knowledge on the HPC cluster systems, including scalable/robust storage, high-bandwidth inter-connects, CPU / GPU architecture, and a knowledge of cloud-based computing...
$181k - $297k
...days, as determined by the business needs of the team. This role will be based in Mountain View, CA. We are seeking an HPC Network Engineer to design, deploy, and operate high-performance, low-latency Ethernet fabrics for large-scale GPU clusters. The role focuses...For contractorsWork at officeFlexible hours$160k - $185k
A leading technology company based in San Jose is seeking a Staff System Engineer responsible for deploying and maintaining critical applications and services. The ideal candidate will have over 12 years of experience in Linux and networking environments, alongside strong...$140k - $158k
A leading technology company in San Jose is seeking a Sr. System Engineer to roll out and maintain business-critical applications and services. The role requires expertise in HPC/AI and offers a competitive salary range of $140,000 - $158,000. Candidates should have a degree...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to HPC Engineer. Be the first to apply!



