Senior GPU/HPC Infra Engineer — High-Perf AI Cluster

Sciforium

Sciforium, an AI infrastructure company in San Francisco, is looking for a Senior HPC & GPU Infrastructure Engineer to manage the health and performance of our GPU compute cluster. You will be the primary custodian of a high-density accelerator environment, bridging hardware operations and machine learning workflows. The ideal candidate will have over 5 years of HPC experience, strong knowledge of GPU debugging, and expertise in Linux systems. The role offers benefits like medical insurance, a 401k plan, and flexible time off. #J-18808-Ljbffr Sciforium

Apply

Vacancy posted 10 hours ago

Similar jobs that could be interesting for youBased on the Senior GPU/HPC Infra Engineer — High-Perf AI Cluster in San Francisco, CA vacancy

Senior AI Infra SRE: GPU Clusters & High-Perf Networking
A leading AI infrastructure company is looking for a Senior Site Reliability Engineer to design and operate large-scale GPU clusters. In this role, you will work closely with clients to troubleshoot... ...experience with GPU systems, high-performance networking, and Linux internals...
Senior
Andromeda
San Francisco, CA
4 days ago
Senior Site Reliability Engineer (GPU Clusters) - Hosting
$250k
...rapidly scaling AI cloud infrastructure... ...next-generation GPU platform... ...is looking for a Senior / Staff Site Reliability Engineer to support and scale... ...large-scale HPC and cloud environments... ...for GPU compute clusters Collaborate... ...to deliver highly available infrastructure...
Senior
Permanent employment
Remote work
San Francisco, CA
17 days ago
Senior SRE — AI GPU Infra for Large-Scale HPC (IPO Equity)
$250k
...Limited in San Francisco is seeking an experienced engineer to design and maintain large-scale GPU clusters for training and inference. The candidate should have... ...Linux systems. Experience with observability stacks and high-performance computing is preferred. The role offers...
Senior
Hamilton Barnes Associates Limited
San Francisco, CA
2 days ago
Senior AI Storage Engineer - Remote GPU HPC Infra
Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience...
Senior
Remote job
Hamilton Barnes Associates Limited
San Francisco, CA
1 day ago
Senior AI Infra SRE: Remote GPU Clusters & Performance
Cortes 23 in San Francisco is seeking a Senior Site Reliability Engineer to design and operate large-scale GPU infrastructure. This high-impact role requires deep expertise in distributed... ...for customers managing large-scale AI workloads. The position offers the opportunity...
Senior
Remote job
Cortes 23
San Francisco, CA
3 days ago
Senior Site Reliability Engineer - AI Cloud & GPU Infra
A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...
Senior
Hyperbolic Labs
San Francisco, CA
1 day ago
Senior ML Training Systems Engineer - Distributed GPU Infra
A leading AI technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation... ...distributed training systems and optimize GPU utilization while collaborating with cross-...
Senior
Baseten
San Francisco, CA
1 day ago
Senior GPU HPC Platform Reliability Engineer
A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates...
Senior
OpenAI
San Francisco, CA
10 hours ago
Senior GPU Infra Engineer — AI Fleet Automation
$180k - $250k
A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years of...
Senior
Fal
San Francisco, CA
1 day ago
Senior HPC & GPU Infrastructure Engineer
...Senior HPC & GPU Infrastructure Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million... ...performance of our GPU compute cluster. You will be the primary...
Senior
Flexible hours
Sciforium
San Francisco, CA
6 days ago
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
$179k - $218k
...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture Crusoe is on a mission to... ...only vertically integrated AI infrastructure company... ...strategies, and be part of a high-performing team that... ...needed to maintain peak cluster health. The Strategic...
Senior
Temporary work
Crusoe
San Francisco, CA
9 days ago
Senior Cluster SRE & Cloud Ops Engineer
...someone passionate about operating highly robust, observable, and... ...experience in Site Reliability Engineering, DevOps, or a similar role... ...of managing data center grade GPU clusters with GPU (and peripherals like... ...serving, or distributed AI frameworks , (Desirable) Hands...
Senior
Fireworks AI
San Francisco, CA
4 days ago
Senior AI/ML Infra & SRE Engineer
Senior Infrastructure Engineer - Bland As a Senior Infrastructure Engineer... ...Kubernetes that handle high-volume, real-time voice... ...industries. Lead - AI/ML Stack Infrastructure... ...production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing...
Senior
Temporary work
AI Chopping Block, Inc.
San Francisco, CA
1 day ago
Software Engineer, GPU Infrastructure - HPC
$230k
...and more, ensuring high availability, performance... ..., and responsible AI deployment over... ...As a software engineer on the Fleet High Performance... ...Computing (HPC) team, you will be... ...Collaborate with clusters, networking, and infrastructure... ...management, kernel perf tuning)...
OpenAI
San Francisco, CA
4 days ago
Senior AI Infra Engineer - GPU Clusters & Scale
$190k - $270k
AI Chopping Block, Inc. is seeking an AI Infrastructure Engineer in San Francisco. This role requires maintaining user-facing services and production systems, specializing in systems while ensuring their reliability and scalability. Candidates should have 5+ years of experience...
Senior
AI Chopping Block, Inc.
San Francisco, CA
4 days ago
Senior HPC GPU Compute Engineer (Hybrid SF)
A technology infrastructure company in San Francisco is seeking an experienced engineer to manage and operate GPU clusters. The role requires over 5 years of hands-on experience, a deep understanding of hardware systems, and a passion for automating fleet operations. You...
Senior
The San Francisco Compute Company
San Francisco, CA
10 hours ago
Senior GPU Kernel Engineer - Accelerate AI Training Systems
MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern...
Senior
MakerMaker
San Francisco, CA
4 days ago
Senior Data Center Network Engineer - GPU Clusters
Baseten is hiring a Network Engineer (Data Centers) in San Francisco to design and own the high-performance network infrastructure for their GPU clusters. This senior role collaborates closely with hardware and platform teams, directly impacting model performance and inference...
Senior
Flexible hours
Baseten
San Francisco, CA
1 day ago
AI Inference Engineer — High-Performance GPU Systems
Requirements Deep experience with GPU programming and performance work... ...years of professional software engineering experience with meaningful work on ML inference or high-performance systems ,... ...and learn from production incidents #J-18808-Ljbffr Perplexity AI
Perplexity AI
San Francisco, CA
4 days ago
Senior Site Reliability Engineer - AI Infrastructure
Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San... ...Andromeda Andromeda Cluster was founded by Nat... ...and debug large-scale GPU infrastructure used for... ...and performance of high-speed interconnects (... ...with Slurm or other HPC schedulers is equally...
Senior
Full time
Remote work
Andromeda
San Francisco, CA
4 days ago
Senior Principal Cloud Infra Reliability Engineer
$261k - $326k
A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands... ...expertise and systems fundamentals, especially in high-scale environments. Competitive compensation includes...
Senior
Crusoe
San Francisco, CA
4 days ago
HPC Operations Engineer — AI Cloud Infra (On-site 4d/wk)
Lambda Inc. is seeking an experienced HPC Engineer to join our team in San Francisco. In this role, you will be responsible for deploying and configuring large-scale HPC clusters for AI workloads, troubleshooting issues, and mentoring junior engineers. The ideal candidate...
Lambda Inc.
San Francisco, CA
10 hours ago
Senior GPU ML Infra Engineer — Mid-Training & Inference
A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate...
Senior
Reflection AI
San Francisco, CA
1 day ago
HPC Cloud Engineer - AI Clusters & Automation
Neura Market is seeking an HPC Engineer to build and configure large-scale HPC clusters for AI workloads. This role requires working 4 days a week onsite in San Francisco/Bellevue, where you will collaborate closely with teams to troubleshoot and improve systems. The ideal...
Neura Market
San Francisco, CA
4 days ago
Senior Specialist Field Engineer - Compute Infrastructure
$188k - $275k
...The Essential Cloud for AI™. Built for pioneers by... ...You'll Do: The Field Engineering organization at CoreWeave... ..., reliable, and high-performance experience.... ...lifecycle: leading new GPU cluster bring-up and acceptance... ...fabric validation and HPC performance benchmarking...
Senior
Permanent employment
Full time
Contract work
Temporary work
Casual work
Work at office
Flexible hours
CoreWeave
San Francisco, CA
2 days ago
HPC AI Technologist
$90k - $210k
...Overview: The Cambridge HPC AI Technologist is a field... ...deployment services or cluster administration.... ...Computer Science, Computer Engineering, or science related field... ...display solid knowledge of GPU-focused hardware/... ...ability to multitask, and high attention to detail....
Full time
Local area
Remote work
Cambridge Computer Services, Inc
San Francisco, CA
3 days ago
Senior Antenna Engineer - Lead High-Perf Wireless Systems
$140k - $225k
A leading innovation lab in San Francisco seeks a Senior Antenna Engineer to lead high-performance antenna systems from conception to production. The ideal candidate will have over 10 years of experience in consumer electronics and expertise in antenna design. Responsibilities...
Senior
Hp Iq
San Francisco, CA
2 days ago
Senior Staff Cloud Reliability Engineer for AI Infra
Epoch Biodesign in San Francisco is seeking a Senior Staff Cloud Support Engineer to lead technical escalations and improve cloud infrastructure... ...influence architectural decisions while ensuring high availability for AI workloads. The ideal candidate has over 8 years of...
Senior
Epoch Biodesign
San Francisco, CA
3 days ago
HPC/ GPU Hardware Engineer
.... When people finance GPU clusters, the datacenters housing... ...market? Otherwise, as AI scales, compute only... ...culture, mentor junior engineers, and learn from our customers... ...scaling at least one HPC or GPU compute cluster... ...troubleshooting high‑speed fabrics such as InfiniBand...
Long term contract
Contract work
Fixed term contract
Work at office
Local area
Visa sponsorship
Shift work
3 days per week
The San Francisco Compute Company
San Francisco, CA
4 days ago
Senior HPC Systems Engineer - Research Compute
...CoreHPC team at UCSF Health is looking for an HPC Systems Engineer to enhance and maintain the Institute’s HPC clusters. The role involves defining and implementing complex... ...experience with HPC systems, and expertise in high-performance parallel filesystems like Lustre...
Senior
UCSF Health
San Francisco, CA
42 minutes ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior GPU/HPC Infra Engineer — High-Perf AI Cluster. Be the first to apply!