Principal/Staff HPC Network Engineer

The San Francisco Compute Company

We’re building the company which will de‑risk the largest infrastructure build‑out in history. When people finance GPU clusters, the datacenters housing them, and the infrastructure powering them, they need "offtake" - meaning someone has signed a contract to lease the cluster for a period of time before its even built. Financing a GPU cluster is inherently risky, since margins are thin and volumes are huge. Lenders don’t want to take on the risk that cluster developers can’t repay their loan, and cluster developers really don’t want to risk not selling their cluster. As a result, risk is offloaded to the customer using fixed‑price long‑term contracts. If you don’t mitigate this customer risk, there’s a bubble. This isn’t SaaS anymore - application layer companies sign multi‑year contracts for computer and inference, but sell to customers on monthly subscriptions. If you mess up a purchase, it’s game over: a minor shift in your revenue growth rate might mean the difference between profit or bankruptcy. But what if companies could exit their contract by selling it back to the market? Otherwise, as AI scales, compute only becomes available to folks who can effectively take on that risk. A 2‑person startup in a San Francisco Victorian can’t realistically sign a 5‑year take or pay contract on $100m supercomputers. But they may be able to buy the month of liquidity that someone else sold back. So that’s what we make: a liquid market for GPU offtake. About the Role GPU clusters are some of the most performant computers on the planet. Even smaller clusters by today’s standards would have ranked in the TOP500 five years ago. Our infrastructure team is responsible for architecting and deploying new clusters around the world and keeping them running smoothly. You’ll participate in on‑call rotation, fix issues when they arise, and lean into automation to enable deployments at scale. We’re a small but ambitious team so you’ll be an early contributor helping to shape culture, mentor junior engineers, and learn from our customers. About You You will have 10+ years of experience with hands‑on management or architecture with network for at least one GPU cluster in the past (ideally a cluster with >1k GPU’s but not required) You deeply understand the fundamentals of Ethernet (RoCEv2) and/or InfiniBand networks in CLOS/fat‑tree topologies You have built HPC network architectures (eBGP, fat‑tree, VXLAN, MCLAG, etc.) The idea of implementing zero‑touch provisioning for a large multi‑layer network excites you (you embrace automation) You appreciate, value and generate good documentation You have the ability and willingness to mentor junior engineers You’re open to coming in to our San Francisco office 3‑4 days per week Some Nice to Haves You understand data center concepts including power, cooling and how to engage with colo providers You have experience with Linux systems administration including managing kernel drivers and tuning the network stack You have experience with Linux virtualization (KVM, QEMU, libvirt, etc.) You’ve had exposure to containers and Kubernetes operators Benefits Generous equity grant Team members are offered a competitive salary along with equity in the company Visa Sponsorships Yes, we sponsor visas and work permits Retirement matching We match 401(k) plans up to 4% Medical, dental & vision We offer competitive medical, dental, vision insurance for employees and dependents and cover 100% of premiums Time off We offer unlimited paid time off as well as 10+ observed holidays Parental leave We offer biological, adoptive, and foster parents paid time off to spend quality time with family Daily lunch We cover lunch daily for employees Unlimited office book budget You can buy as many books for the office as you want The San Francisco Compute Company is committed to maintaining a workplace free from discrimination and harassment. We make employment decisions based on business needs, job requirements, and individual qualifications, without regard to race, color, religion, belief, national origin, social or ethical origin, age, physical, mental, or sensory disability, sexual orientation, gender identity or expression, marital status, civil union or domestic partnership status, past or present military service, HIV status, family medical history or genetic information, family or parental status including pregnancy, or any other status protected by law. We welcome the opportunity to consider qualified applicants with prior arrest or conviction records. Our commitment to diversity includes hiring talented individuals regardless of their criminal history, in accordance with local, state, and federal laws, including San Francisco’s Fair Chance Ordinance and California’s ban‑the‑box laws. If you require reasonable accommodation for any reason, please reach out to us at View email address on click.appcast.io #J-18808-Ljbffr The San Francisco Compute Company

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Principal/Staff HPC Network Engineer in San Francisco, CA vacancy

Principal/Staff HPC Network Engineer
$250k - $325k
...CA Employment Type Full time Department Engineering Compensation $250K – $325K We're building... ...hands‑on management or architecture with network for at least one GPU cluster in the past... ...CLOS/fat‑tree topologies You have built HPC network architectures (eBGP, fat‑tree, VXLAN...
Principal
Long term contract
Full time
Contract work
Fixed term contract
Work at office
Local area
Visa sponsorship
Shift work
3 days per week
Electric Capital
San Francisco, CA
1 day ago
Staff HPC & GPU Network Deployment Engineer
A tech-driven energy solutions firm based in San Francisco is seeking a Staff Network Deployment Engineer. The candidate will lead the deployment of advanced network systems that support high-performance GPU compute clusters. The role requires a minimum of 8 years of network...
Suggested
Crusoe Energy Systems LLC
San Francisco, CA
2 days ago
Principal AI Infrastructure Network Engineer
$150k - $215k
Nscale is seeking a Principal Back-End Network Engineer to lead technical initiatives for AI infrastructure. Responsibilities include owning the reliability... ...excellence. With over 10 years of experience in HPC networking, you will influence system design and mentor teams...
Principal
Nscale
San Francisco, CA
5 days ago
Principal Back-End Network Engineer - AI Infrastructure Operations
$150k - $215k
Principal Back-End Network Engineer - AI Infrastructure Operations US About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective... ...of experience in network engineering, with deep focus on HPC, AI, or hyperscale data centre networking Expert-level...
Principal
Flexible hours
Nscale
San Francisco, CA
5 days ago
Staff HPC Network Engineer
$224k - $284k
...and improve them until they work at scale. We are roboticists, engineers, operators, and builders. We believe the next great technology... ...real-world impact, join us. What you’ll do We’re seeking a HPC Network Engineer to join our founding team who will design and run the...
Suggested
Full time
Work at office
Immediate start
Flexible hours
Atoms
San Francisco, CA
2 days ago
Staff HPC Network Engineer - Onsite SF, Equity & PTO
$224k - $284k
Atoms is seeking a network engineer based in San Francisco to design and optimize the high-performance network fabric for their GPU and CPU operations... ...will have substantial experience with network design for HPC environments, familiarity with various switch platforms, and a...
ATOMS Careers page
San Francisco, CA
4 days ago
Senior Network Engineer, AI Infrastructure & HPC
$250k - $320k
Gimlet Labs, Inc. is seeking a Network Engineer to design and build network infrastructure for AI workloads at scale. This role involves ensuring robust and reliable networking for production systems across distributed environments, focusing on performance and efficiency...
Gimlet Labs, Inc.
San Francisco, CA
3 days ago
Senior Principal Cloud Security Engineer (Go/Java)
...Dormont Manufacturing Co is seeking a Principal Software Engineer with a focus on cloud security solutions. This role involves providing technical... ...to deliver innovative and impactful security services for network security architecture. The ideal candidate will have...
Principal
Dormont Manufacturing Co
San Francisco, CA
12 hours ago
Senior Principal Front-End Network Engineer, AI Infrastructure Operations
Senior Principal Front-End Network Engineer, AI Infrastructure Operations Houston; New York; San Francisco; Seattle About Nscale Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise...
Principal
Flexible hours
Nscale
San Francisco, CA
2 days ago
Principal Network Engineer
$175k - $300k
...design and stand up a 400GbE spine-leaf network from scratch to do eBGP and then... ...button click, this role is for you! As a principal network engineer, you’ll be responsible for the design,... ...with technologies associated with HPC and GPU networks including RoCEv2, InfiniBand...
Principal
Work at office
Local area
Remote work
Visa sponsorship
San Francisco Compute
San Francisco, CA
more than 2 months ago
Senior HPC Infrastructure Engineer GPU Clusters (Hybrid SF)
...Manufacturing Co in San Francisco is seeking an experienced infrastructure engineer to support GPU clusters and ensure operational efficiency. The... ...environment. Ideal candidates will have 5+ years in HPC or GPU cluster operations, expertise in server hardware, and a passion...
Work at office
Dormont Manufacturing Company
San Francisco, CA
2 days ago
Principal Systems Software Engineer
$300 per month
...intelligence. We’re crafting the engine that powers a world... ...infrastructure. Principal Systems Software... ...ways of managing memory, networking, and compute that don't... .... Working alongside Staff and Senior engineers to... ...GCP) or a specialized HPC cloud. ~ Deep, authoritative...
Principal
Crusoe
San Francisco, CA
3 days ago
Lead HPC Network Engineer — Global GPU Infra, Equity
$250k - $325k
Electric Capital is looking for an experienced engineer to join their team in San Francisco. You will play a key role in architecting... ...mentor junior engineers. The role requires a deep understanding of networking, including Ethernet and InfiniBand. The position offers...
Electric Capital
San Francisco, CA
5 days ago
Senior Principal Cloud Infra Reliability Engineer
$261k - $326k
...technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems.... ...operational excellence. Candidates should have strong networking expertise and systems fundamentals, especially in high-scale...
Principal
Crusoe
San Francisco, CA
4 days ago
Sr. Principal Software Engineer (Cortex Platform)
$170k - $277k
...including the data pipeline, analytics engine, and user interface. We are a fast‑paced... ...the world. Job Summary As a Senior Principal Backend Engineer in our Cortex group, you... ..., including security, global network infrastructure, and load balancing. ~...
Principal
Dormont Manufacturing Co
San Francisco, CA
12 hours ago
Sr. Network Engineer
...and scale Lambda's high performance cloud network Work on deploying and configuring... ...operations and on-call rotation for Network Engineering team Qualifications Have 10+ years of experience... ...Terraform/Ansible/Salt Hands‑on with HPC/AI networking: RoCEv2 and/or InfiniBand (...
Work at office
Local area
Work from home
Flexible hours
Lambda
San Francisco, CA
5 days ago
Network Engineer, Capacity and Efficiency
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...for non-accelerator infrastructure — the network, compute, and storage backbone that moves... ...and bandwidth symmetry. Experience with HPC fabrics like InfiniBand, RoCE v2,...
United States Digital Space LLC
San Francisco, CA
3 hours ago
Principal SRE: AI Cloud Reliability Architect
Dormont Manufacturing Co in San Francisco is searching for a Principal Site Reliability Engineer to lead the design and reliability of a next-gen NeoCloud platform. You will define reliability architecture and oversee incident responses, ensuring high performance and efficiency...
Principal
Dormont Manufacturing Co
San Francisco, CA
2 days ago
Senior Principal Backend Engineer Remote API Platform Leader
An innovative tech platform is seeking a Senior Principal Software Engineer to lead the development of its next-gen API Platform. The role involves defining the technical vision, collaborating with various departments, and mentoring other engineers. The ideal candidate...
Principal
Remote job
jobright.com
San Francisco, CA
3 days ago
Senior Principal Firmware Engineer - Data Center Platform
$96.8k - $306.4k
Ll Oefentherapie is seeking a Senior Principal Firmware Engineer to design and develop firmware for cutting-edge data center platforms in San Francisco. The ideal candidate will have over 10 years of experience in firmware engineering, with strong leadership and communication...
Principal
Ll Oefentherapie
San Francisco, CA
3 days ago
Principal Engineer: AI Marketing Platform Architect (Hybrid)
DocuSign, Inc. is looking for a Principal Engineer to manage technological strategies within the marketing and sales domains. This role requires over 15 years of experience in software engineering, focusing on large-scale, data-intensive platforms that drive business efficiency...
Principal
DocuSign, Inc.
San Francisco, CA
3 days ago
Principal Infra & Systems Engineer — Global Cloud
$240k
Convex is seeking experienced engineers to design and maintain its global cloud infrastructure in San Francisco. This role involves architectural decisions and collaboration with teams to improve system performance and reliability while prioritizing simplicity. The ideal...
Principal
Convex
San Francisco, CA
1 day ago
Principal Engineer: AI Health Platform Innovator
Health Universe, Inc. is seeking a Principal Engineer to enhance their platform that revolutionizes science and medicine. This role focuses on developing a web application that supports health data scientists in deploying cutting-edge ML apps while ensuring compliance...
Principal
Health Universe, Inc.
San Francisco, CA
2 days ago
Principal Engineer - AI Platform for Contract Intelligence
$285k - $315k
Ironclad Inc. is seeking a Principal Engineer in San Francisco to drive the development of AI-powered contract solutions. The role requires over 10 years of experience in software engineering, especially in designing and evolving distributed systems. You'll collaborate...
Principal
Contract work
Ironclad Inc.
San Francisco, CA
1 day ago
Principal AI Security Engineer - Onboarding & Cloud
$240k - $250k
Saviynt Inc. is seeking a Principal Software Engineer in San Francisco, CA, to join their AI Security team. In this role, you will design and implement workflows for AI security products and develop secure, scalable software across major cloud platforms. The ideal candidate...
Principal
Saviynt
San Francisco, CA
4 days ago
Principal Java Engineer — AI Fintech Platform (Hybrid SF)
Jack & Jill is looking for a Principal Software Engineer to join their team in San Francisco. In this role, you will architect and build secure embedded finance products using Java. You’ll work closely with a seasoned team to shape a high-scale platform and innovate on...
Principal
Jack & Jill
San Francisco, CA
1 day ago
Principal Platform Security Engineer — Architect & Secure Infra
Salesforce, Inc. is seeking a Principal Software Engineer for their Platform Security team in San Francisco, California. This role involves leading software development across complex security data pipelines and owning system architecture for vulnerability scanning. The...
Principal
Salesforce, Inc.
San Francisco, CA
4 days ago
Principal Software Engineer — Internal Platforms & Security
Upstart is looking for a Principal Software Engineer to provide technical leadership and drive architectural direction. The role involves designing internal platforms and security automation systems, partnering with stakeholders across functional areas. Ideal candidates...
Principal
Remote work
Upstart
San Francisco, CA
4 days ago
Senior Staff Network Engineer, Operations
$225k - $275k
...Crusoe. About this Role Crusoe Cloud is seeking a Senior Staff Network Operations Engineer to own production reliability across our global network, including... ...RDMA/RoCE (v1 and v2) lossless fabrics for GPU and HPC workloads, including PFC, ECN, and DCQCN tuning. Required...
Temporary work
ProducePay
San Francisco, CA
3 days ago
Principal Backend Engineer - Cloud Microservices & Security
Dormont Manufacturing Co is seeking a Principal Backend Engineer in San Francisco, California, to lead backend development for the Cortex platform. The ideal candidate will have over 8 years of experience in software engineering, strong programming skills in Go and Python...
Principal
Dormont Manufacturing Co
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal/Staff HPC Network Engineer. Be the first to apply!