Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior GPU/HPC Infra Engineer — High-Perf AI Cluster

Sciforium

Sciforium, an AI infrastructure company in San Francisco, is looking for a Senior HPC & GPU Infrastructure Engineer to manage the health and performance of our GPU compute cluster. You will be the primary custodian of a high-density accelerator environment, bridging hardware operations and machine learning workflows. The ideal candidate will have over 5 years of HPC experience, strong knowledge of GPU debugging, and expertise in Linux systems. The role offers benefits like medical insurance, a 401k plan, and flexible time off. #J-18808-Ljbffr Sciforium

Vacancy posted 10 hours ago
Similar jobs that could be interesting for youBased on the Senior GPU/HPC Infra Engineer — High-Perf AI Cluster in San Francisco, CA vacancy
  • A leading AI infrastructure company is looking for a Senior Site Reliability Engineer to design and operate large-scale GPU clusters. In this role, you will work closely with clients to troubleshoot...  ...experience with GPU systems, high-performance networking, and Linux internals... 
    Senior

    Andromeda

    San Francisco, CA
    4 days ago
  • $250k

     ...rapidly scaling AI cloud infrastructure...  ...next-generation GPU platform...  ...is looking for a Senior / Staff Site Reliability Engineer to support and scale...  ...large-scale HPC and cloud environments...  ...for GPU compute clusters Collaborate...  ...to deliver highly available infrastructure... 
    Senior
    Permanent employment
    Remote work
    San Francisco, CA
    17 days ago
  • $250k

     ...Limited in San Francisco is seeking an experienced engineer to design and maintain large-scale GPU clusters for training and inference. The candidate should have...  ...Linux systems. Experience with observability stacks and high-performance computing is preferred. The role offers... 
    Senior

    Hamilton Barnes Associates Limited

    San Francisco, CA
    2 days ago
  • Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience... 
    Senior
    Remote job

    Hamilton Barnes Associates Limited

    San Francisco, CA
    1 day ago
  • Cortes 23 in San Francisco is seeking a Senior Site Reliability Engineer to design and operate large-scale GPU infrastructure. This high-impact role requires deep expertise in distributed...  ...for customers managing large-scale AI workloads. The position offers the opportunity... 
    Senior
    Remote job

    Cortes 23

    San Francisco, CA
    3 days ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 
    Senior

    Hyperbolic Labs

    San Francisco, CA
    1 day ago
  • A leading AI technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation...  ...distributed training systems and optimize GPU utilization while collaborating with cross-... 
    Senior

    Baseten

    San Francisco, CA
    1 day ago
  • A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates... 
    Senior

    OpenAI

    San Francisco, CA
    10 hours ago
  • $180k - $250k

    A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years of... 
    Senior

    Fal

    San Francisco, CA
    1 day ago
  •  ...Senior HPC & GPU Infrastructure Engineer Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million...  ...performance of our GPU compute cluster. You will be the primary... 
    Senior
    Flexible hours

    Sciforium

    San Francisco, CA
    6 days ago
  • $179k - $218k

     ...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture Crusoe is on a mission to...  ...only vertically integrated AI infrastructure company...  ...strategies, and be part of a high-performing team that...  ...needed to maintain peak cluster health. The Strategic... 
    Senior
    Temporary work

    Crusoe

    San Francisco, CA
    9 days ago
  •  ...someone passionate about operating highly robust, observable, and...  ...experience in Site Reliability Engineering, DevOps, or a similar role...  ...of managing data center grade GPU clusters with GPU (and peripherals like...  ...serving, or distributed AI frameworks , (Desirable) Hands... 
    Senior

    Fireworks AI

    San Francisco, CA
    4 days ago
  • Senior Infrastructure Engineer - Bland As a Senior Infrastructure Engineer...  ...Kubernetes that handle high-volume, real-time voice...  ...industries. Lead - AI/ML Stack Infrastructure...  ...production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing... 
    Senior
    Temporary work

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  • $230k

     ...and more, ensuring high availability, performance...  ..., and responsible AI deployment over...  ...As a software engineer on the Fleet High Performance...  ...Computing (HPC) team, you will be...  ...Collaborate with clusters, networking, and infrastructure...  ...management, kernel perf tuning)... 

    OpenAI

    San Francisco, CA
    4 days ago
  • $190k - $270k

    AI Chopping Block, Inc. is seeking an AI Infrastructure Engineer in San Francisco. This role requires maintaining user-facing services and production systems, specializing in systems while ensuring their reliability and scalability. Candidates should have 5+ years of experience... 
    Senior

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • A technology infrastructure company in San Francisco is seeking an experienced engineer to manage and operate GPU clusters. The role requires over 5 years of hands-on experience, a deep understanding of hardware systems, and a passion for automating fleet operations. You... 
    Senior

    The San Francisco Compute Company

    San Francisco, CA
    10 hours ago
  • MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern... 
    Senior

    MakerMaker

    San Francisco, CA
    4 days ago
  • Baseten is hiring a Network Engineer (Data Centers) in San Francisco to design and own the high-performance network infrastructure for their GPU clusters. This senior role collaborates closely with hardware and platform teams, directly impacting model performance and inference... 
    Senior
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  • Requirements Deep experience with GPU programming and performance work...  ...years of professional software engineering experience with meaningful work on ML inference or high-performance systems ,...  ...and learn from production incidents #J-18808-Ljbffr Perplexity AI

    Perplexity AI

    San Francisco, CA
    4 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San...  ...Andromeda Andromeda Cluster was founded by Nat...  ...and debug large-scale GPU infrastructure used for...  ...and performance of high-speed interconnects (...  ...with Slurm or other HPC schedulers is equally... 
    Senior
    Full time
    Remote work

    Andromeda

    San Francisco, CA
    4 days ago
  • $261k - $326k

    A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands...  ...expertise and systems fundamentals, especially in high-scale environments. Competitive compensation includes... 
    Senior

    Crusoe

    San Francisco, CA
    4 days ago
  • Lambda Inc. is seeking an experienced HPC Engineer to join our team in San Francisco. In this role, you will be responsible for deploying and configuring large-scale HPC clusters for AI workloads, troubleshooting issues, and mentoring junior engineers. The ideal candidate... 

    Lambda Inc.

    San Francisco, CA
    10 hours ago
  • A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate... 
    Senior

    Reflection AI

    San Francisco, CA
    1 day ago
  • Neura Market is seeking an HPC Engineer to build and configure large-scale HPC clusters for AI workloads. This role requires working 4 days a week onsite in San Francisco/Bellevue, where you will collaborate closely with teams to troubleshoot and improve systems. The ideal... 

    Neura Market

    San Francisco, CA
    4 days ago
  • $188k - $275k

     ...The Essential Cloud for AI™. Built for pioneers by...  ...You'll Do: The Field Engineering organization at CoreWeave...  ..., reliable, and high-performance experience....  ...lifecycle: leading new GPU cluster bring-up and acceptance...  ...fabric validation and HPC performance benchmarking... 
    Senior
    Permanent employment
    Full time
    Contract work
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    San Francisco, CA
    2 days ago
  • $90k - $210k

     ...Overview: The Cambridge HPC AI Technologist is a field...  ...deployment services or cluster administration....  ...Computer Science, Computer Engineering, or science related field...  ...display solid knowledge of GPU-focused hardware/...  ...ability to multitask, and high attention to detail.... 
    Full time
    Local area
    Remote work

    Cambridge Computer Services, Inc

    San Francisco, CA
    3 days ago
  • $140k - $225k

    A leading innovation lab in San Francisco seeks a Senior Antenna Engineer to lead high-performance antenna systems from conception to production. The ideal candidate will have over 10 years of experience in consumer electronics and expertise in antenna design. Responsibilities... 
    Senior

    Hp Iq

    San Francisco, CA
    2 days ago
  • Epoch Biodesign in San Francisco is seeking a Senior Staff Cloud Support Engineer to lead technical escalations and improve cloud infrastructure...  ...influence architectural decisions while ensuring high availability for AI workloads. The ideal candidate has over 8 years of... 
    Senior

    Epoch Biodesign

    San Francisco, CA
    3 days ago
  •  .... When people finance GPU clusters, the datacenters housing...  ...market? Otherwise, as AI scales, compute only...  ...culture, mentor junior engineers, and learn from our customers...  ...scaling at least one HPC or GPU compute cluster...  ...troubleshooting high‑speed fabrics such as InfiniBand... 
    Long term contract
    Contract work
    Fixed term contract
    Work at office
    Local area
    Visa sponsorship
    Shift work
    3 days per week

    The San Francisco Compute Company

    San Francisco, CA
    4 days ago
  •  ...CoreHPC team at UCSF Health is looking for an HPC Systems Engineer to enhance and maintain the Institute’s HPC clusters. The role involves defining and implementing complex...  ...experience with HPC systems, and expertise in high-performance parallel filesystems like Lustre... 
    Senior

    UCSF Health

    San Francisco, CA
    42 minutes ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior GPU/HPC Infra Engineer — High-Perf AI Cluster. Be the first to apply!