Senior GPU/HPC Infra Engineer — High-Perf AI Cluster
Sciforium
Sciforium, an AI infrastructure company in San Francisco, is looking for a Senior HPC & GPU Infrastructure Engineer to manage the health and performance of our GPU compute cluster. You will be the primary custodian of a high-density accelerator environment, bridging hardware operations and machine learning workflows. The ideal candidate will have over 5 years of HPC experience, strong knowledge of GPU debugging, and expertise in Linux systems. The role offers benefits like medical insurance, a 401k plan, and flexible time off. #J-18808-Ljbffr Sciforium
- A leading AI infrastructure company is looking for a Senior Site Reliability Engineer to design and operate large-scale GPU clusters. In this role, you will work closely with clients to troubleshoot... ...experience with GPU systems, high-performance networking, and Linux internals...Senior
$250k
...rapidly scaling AI cloud infrastructure... ...next-generation GPU platform... ...is looking for a Senior / Staff Site Reliability Engineer to support and scale... ...large-scale HPC and cloud environments... ...for GPU compute clusters Collaborate... ...to deliver highly available infrastructure...SeniorPermanent employmentRemote work$250k
...Limited in San Francisco is seeking an experienced engineer to design and maintain large-scale GPU clusters for training and inference. The candidate should have... ...Linux systems. Experience with observability stacks and high-performance computing is preferred. The role offers...Senior- Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience...SeniorRemote job
- Cortes 23 in San Francisco is seeking a Senior Site Reliability Engineer to design and operate large-scale GPU infrastructure. This high-impact role requires deep expertise in distributed... ...for customers managing large-scale AI workloads. The position offers the opportunity...SeniorRemote job
- A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...Senior
- A leading AI technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation... ...distributed training systems and optimize GPU utilization while collaborating with cross-...Senior
- A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates...Senior
$180k - $250k
A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years of...Senior- ...Senior HPC & GPU Infrastructure Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million... ...performance of our GPU compute cluster. You will be the primary...SeniorFlexible hours
$179k - $218k
...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture Crusoe is on a mission to... ...only vertically integrated AI infrastructure company... ...strategies, and be part of a high-performing team that... ...needed to maintain peak cluster health. The Strategic...SeniorTemporary work- ...someone passionate about operating highly robust, observable, and... ...experience in Site Reliability Engineering, DevOps, or a similar role... ...of managing data center grade GPU clusters with GPU (and peripherals like... ...serving, or distributed AI frameworks , (Desirable) Hands...Senior
- Senior Infrastructure Engineer - Bland As a Senior Infrastructure Engineer... ...Kubernetes that handle high-volume, real-time voice... ...industries. Lead - AI/ML Stack Infrastructure... ...production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing...SeniorTemporary work
$230k
...and more, ensuring high availability, performance... ..., and responsible AI deployment over... ...As a software engineer on the Fleet High Performance... ...Computing (HPC) team, you will be... ...Collaborate with clusters, networking, and infrastructure... ...management, kernel perf tuning)...$190k - $270k
AI Chopping Block, Inc. is seeking an AI Infrastructure Engineer in San Francisco. This role requires maintaining user-facing services and production systems, specializing in systems while ensuring their reliability and scalability. Candidates should have 5+ years of experience...Senior- A technology infrastructure company in San Francisco is seeking an experienced engineer to manage and operate GPU clusters. The role requires over 5 years of hands-on experience, a deep understanding of hardware systems, and a passion for automating fleet operations. You...Senior
- MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern...Senior
- Baseten is hiring a Network Engineer (Data Centers) in San Francisco to design and own the high-performance network infrastructure for their GPU clusters. This senior role collaborates closely with hardware and platform teams, directly impacting model performance and inference...SeniorFlexible hours
- Requirements Deep experience with GPU programming and performance work... ...years of professional software engineering experience with meaningful work on ML inference or high-performance systems ,... ...and learn from production incidents #J-18808-Ljbffr Perplexity AI
- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San... ...Andromeda Andromeda Cluster was founded by Nat... ...and debug large-scale GPU infrastructure used for... ...and performance of high-speed interconnects (... ...with Slurm or other HPC schedulers is equally...SeniorFull timeRemote work
$261k - $326k
A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands... ...expertise and systems fundamentals, especially in high-scale environments. Competitive compensation includes...Senior- Lambda Inc. is seeking an experienced HPC Engineer to join our team in San Francisco. In this role, you will be responsible for deploying and configuring large-scale HPC clusters for AI workloads, troubleshooting issues, and mentoring junior engineers. The ideal candidate...
- A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate...Senior
- Neura Market is seeking an HPC Engineer to build and configure large-scale HPC clusters for AI workloads. This role requires working 4 days a week onsite in San Francisco/Bellevue, where you will collaborate closely with teams to troubleshoot and improve systems. The ideal...
$188k - $275k
...The Essential Cloud for AI™. Built for pioneers by... ...You'll Do: The Field Engineering organization at CoreWeave... ..., reliable, and high-performance experience.... ...lifecycle: leading new GPU cluster bring-up and acceptance... ...fabric validation and HPC performance benchmarking...SeniorPermanent employmentFull timeContract workTemporary workCasual workWork at officeFlexible hours$90k - $210k
...Overview: The Cambridge HPC AI Technologist is a field... ...deployment services or cluster administration.... ...Computer Science, Computer Engineering, or science related field... ...display solid knowledge of GPU-focused hardware/... ...ability to multitask, and high attention to detail....Full timeLocal areaRemote work$140k - $225k
A leading innovation lab in San Francisco seeks a Senior Antenna Engineer to lead high-performance antenna systems from conception to production. The ideal candidate will have over 10 years of experience in consumer electronics and expertise in antenna design. Responsibilities...Senior- Epoch Biodesign in San Francisco is seeking a Senior Staff Cloud Support Engineer to lead technical escalations and improve cloud infrastructure... ...influence architectural decisions while ensuring high availability for AI workloads. The ideal candidate has over 8 years of...Senior
- .... When people finance GPU clusters, the datacenters housing... ...market? Otherwise, as AI scales, compute only... ...culture, mentor junior engineers, and learn from our customers... ...scaling at least one HPC or GPU compute cluster... ...troubleshooting high‑speed fabrics such as InfiniBand...Long term contractContract workFixed term contractWork at officeLocal areaVisa sponsorshipShift work3 days per week
- ...CoreHPC team at UCSF Health is looking for an HPC Systems Engineer to enhance and maintain the Institute’s HPC clusters. The role involves defining and implementing complex... ...experience with HPC systems, and expertise in high-performance parallel filesystems like Lustre...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior GPU/HPC Infra Engineer — High-Perf AI Cluster. Be the first to apply!
- senior cost analyst San Francisco, CA
- senior computer engineer San Francisco, CA
- senior program specialist San Francisco, CA
- senior manager quality engineering San Francisco, CA
- senior software test automation engineer San Francisco, CA
- senior design technologist San Francisco, CA
- senior director corporate development San Francisco, CA
- senior design verification engineer San Francisco, CA
- senior director quality San Francisco, CA
- senior director of development San Francisco, CA

