Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Infrastructure Engineer, Cluster Infrastructure

$320k - $405k

United States Digital Space LLC

About the company The company’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role The company's Infrastructure organization is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand. Cluster Infra owns the full lifecycle of compute clusters at the company. We build agent-driven automation for cluster provisioning and lifecycle management across all major cloud providers and our own datacenters. Our systems stand up clusters that are interconnected with high bandwidth, secure-by-default, and able to automatically drain and recover in response to failure. As a Staff engineer on this team, you'll set the technical direction for how the company brings compute online - at a moment when the scale of that compute is growing faster than at almost any company in the world. Key responsibilities Own the technical strategy and roadmap for agent-driven cluster lifecycle management - provisioning, updates and decommissioning Partner across teams to ensure new compute capacity is ingested on time Align with partner teams on physical build-out and leverage cloud solutions to deliver high-bandwidth inter-cluster connectivity Collaborate with security owners to ensure clusters are provisioned secure-by-default Define and drive strategy on cluster scalability, homogeneity and fault tolerance Work closely with cloud providers and internal research, inference and product teams to shape long-term compute, data, and infrastructure strategy Establish and evolve operational-excellence practices: incident response, postmortem culture and on-call health Support the growth of engineers around you through technical mentorship and coaching Minimum qualifications Deep expertise in distributed systems, reliability, and cloud platforms (e.g., Kubernetes, IaC, AWS/GCP/Azure) Strong proficiency in at least one systems language (e.g., Rust, Go, or Python), IaC proficiency with Terraform. Track record of leading complex, multi-quarter technical initiatives spanning multiple teams or systems Ability to build alignment across senior stakeholders and communicate effectively at all levels Preferred qualifications 8+ years of software engineering experience, including time as a technical lead setting direction for a team Experience operating large-scale compute infrastructure at hyperscale (100+ clusters, 10K+ nodes) Depth in one or more of: Kubernetes internals, cluster provisioning and management systems, cluster orchestration systems (Mesos, Borg-like) Experience with cloud networking: VPC design and peering, Shared VPC/Transit Gateway, Cloud Interconnect/Direct Connect, Cloud NAT, cross-cloud private connectivity, BGP and route control, edge load balancing and DDoS mitigation (Cloud Armor / AWS Shield) Experience with cluster and host networking: CNI (Cilium), eBPF, NetworkPolicy, multi-NIC, sFlow, service mesh (Istio/Envoy/Linkerd, mTLS) Experience with cluster security: pod security standards and admission control, RBAC and least-privilege IAM, node and container hardening, supply-chain/image provenance Deep experience with infrastructure-as-code (Terraform, Atlantis), workflow orchestration (Temporal, Argo Workflows) Skill in quickly understanding systems design tradeoffs and keeping track of rapidly evolving software systems Compensation

$320,000 — $405,000 USD

Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. How we're different We believe that the highest-impact AI research will be big science. At the company we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication. #J-18808-Ljbffr United States Digital Space LLC

Vacancy posted 4 hours ago
Similar jobs that could be interesting for youBased on the Staff Infrastructure Engineer, Cluster Infrastructure in San Francisco, CA vacancy
  • $224k - $284k

     ...improve them until they work at scale. We are roboticists, engineers, operators, and builders. We believe the next great...  ...world impact, join us. What you’ll do We’re seeking a Cluster Infrastructure Engineer to join our founding team who will own the GPU compute... 
    Suggested
    Full time
    Work at office
    Flexible hours

    ATOMS Careers page

    San Francisco, CA
    2 days ago
  • Staff Infrastructure & Performance Engineer As a Staff Infrastructure Performance & Engineer , you will own and evolve the performance, reliability, and...  ...indexing strategies, connection management, replication, cluster design, and failover. Architect and operate multi-... 
    Suggested
    Fixed term contract
    Flexible hours

    Nash

    San Francisco, CA
    14 hours ago
  • $300 per month

     ...energy and intelligence. We’re crafting the engine that powers a world where people can...  ...for responsible, transformative cloud infrastructure. About this role Crusoe Cloud Network...  ...network for High-Performance Compute (HPC) Clusters with GPUs. The ideal individual will be... 
    Suggested
    Temporary work
    Work experience placement
    Work at office

    Crusoe Energy Systems LLC

    San Francisco, CA
    4 days ago
  • $195k - $235k

     ...the only vertically integrated AI infrastructure company built from the ground up, we...  ...This Role Crusoe Cloud is seeking a Staff Network Operations Engineer to help own production reliability...  ...backbone, data center fabric, and GPU cluster interconnects. This is a hands‑on... 
    Suggested
    Temporary work
    Worldwide

    Crusoe Energy Systems LLC

    San Francisco, CA
    4 days ago
  • $193k - $234k

     ...the only vertically integrated AI infrastructure company built from the ground up, we...  ...seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical...  ...testing (SAT) for new network clusters, ensuring zero-defect handovers to... 
    Suggested
    Temporary work
    Remote work

    Crusoe Energy Systems

    San Francisco, CA
    1 day ago
  • $160k - $300k

     ...Databricks, GM, and Character, our mission is to revolutionize how engineering decisions are made, turning complexity into clarity for the...  ...company together. About the Role As a Senior / Staff Infrastructure Engineer at Apiphany, you’ll design, build, and operate the... 
    Work at office
    Visa sponsorship
    Flexible hours

    Apiphany

    San Francisco, CA
    4 days ago
  • $250k - $325k

     ...Francisco, CA Employment Type Full time Department Engineering Compensation $250K - $325K We're building the company which will de-risk the largest infrastructure build‑out in history. When people finance GPU clusters, the datacenters housing them, and the... 
    Long term contract
    Full time
    Contract work
    Fixed term contract
    Work at office
    Local area
    Visa sponsorship
    Shift work
    3 days per week

    Electric Capital

    San Francisco, CA
    14 hours ago
  • $193k - $234k

     ...the only vertically integrated AI infrastructure company built from the ground up, we...  ...seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical...  ...networks for GPU compute clusters. As we rapidly expand our footprint... 
    Temporary work
    Work at office
    Remote work

    Crusoe Energy Systems LLC

    San Francisco, CA
    2 days ago
  • $224k - $284k

     ...they work at scale. We are roboticists, engineers, operators, and builders. We believe...  ...and the architecture to scale it as our cluster grows. Design, optimize, and scale the...  ...efficiency. Collaborate across the infrastructure team to solve cross-discipline problems... 
    Full time
    Work at office
    Immediate start
    Flexible hours

    Atoms

    San Francisco, CA
    2 days ago
  •  ...Role Abridge’s services and engineering teams are in hyperscale mode...  ...are looking for experienced Staff Platform Engineers to join our...  ...and help scale our cloud infrastructure, developer platform, and operational...  ...upgrades, and multi-tenant cluster design. Experience designing... 
    Hourly pay
    Full time
    Local area
    Remote work
    Flexible hours

    Neura Market

    San Francisco, CA
    1 day ago
  •  ...About Us We’re building the AI infrastructure powering the future of financial operations - starting with automating the most...  ...performance matter most. About the Role We're looking for a Staff Infrastructure Engineer to architect and own the systems that power Salient at... 
    Full time
    Work at office

    Salient

    San Francisco, CA
    14 hours ago
  • $276.5k - $300k

     ...be owned by everyone. About the Team Our Infrastructure team is a collaborative group of experienced engineers dedicated to supporting the World project's mission...  ...About the Opportunity We are looking for a Staff Infrastructure Engineer to help establish our team... 
    Flexible hours

    World Coin

    San Francisco, CA
    14 hours ago
  •  ...This role is infrastructure-first, with a second gear in backend or QA. Hamilton is building the operating system for charter aviation...  ..., and resilient. That's your job. We're hiring a Staff Platform Engineer to own the infrastructure and internal platforms that let... 
    Second job
    Visa sponsorship

    Hamilton AI

    San Francisco, CA
    14 hours ago
  • Gimlet is building AI infrastructure and orchestration platforms for large-scale AI datacenters. This Infrastructure/Cluster Engineer role involves designing, building, and operating heterogeneous cluster infrastructure that intelligently routes workloads across diverse... 

    Linuxcareers

    San Francisco, CA
    4 days ago
  • $300 per month

     ...intelligence. We’re crafting the engine that powers a world where people can...  ...responsible, transformative cloud infrastructure. About the Role As a Senior Staff Cloud Support Engineer , you are a...  ...(Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident... 
    Full time
    Temporary work

    Epoch Biodesign

    San Francisco, CA
    3 days ago
  • $300 per month

     ...and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate...  ...with us at Crusoe. About the Role: We are seeking a Staff Software Infrastructure Engineer to play a critical role in managing Crusoe’s fleet operations... 
    Temporary work

    Crusoe

    San Francisco, CA
    2 days ago
  • A leading AI infrastructure company is seeking a Staff Infrastructure Engineer in San Francisco. In this role, you will own the systems that power the company at scale, focusing on reliability, scalability, and developer velocity. You will be responsible for designing cloud... 
    Work at office

    Salient

    San Francisco, CA
    14 hours ago
  • We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes...  ..., and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely...  ...large-scale AI training and inference clusters Responsibilities Design, deploy, and maintain... 

    Perplexity

    San Francisco, CA
    2 days ago
  • $178k - $267k

     ...Ireland. Come join us! About the Team Our diverse Product & Engineering team values innovation, collaboration, and the continuous improvement...  ...software engineer to join our AI & Search mission. This cluster of teams is responsible for developing the next‑generation Generative... 
    Local area

    BetterCloud

    San Francisco, CA
    2 days ago
  • What you’ll do As a Senior / Staff Network Engineer, you will define the global technical strategy, architecture, and roadmap for Airwallex’s enterprise and cloud network infrastructure. You will design and deploy highly secure, multi-region hybrid network patterns that... 
    Flexible hours
    Weekend work

    Airwallex-

    San Francisco, CA
    3 days ago
  • $320k - $405k

     ...growing group of committed researchers, engineers, policy experts, and business leaders...  ...systems and routers. We're looking for a Staff Fiber Network Engineer to own the...  ...and wavelength options from carriers and infrastructure providers. Run RFPs, compare bids on cost... 
    Visa sponsorship
    Night shift

    anthropic

    San Francisco, CA
    1 day ago
  • $215k - $265k

     ...controls, and automation across the org. We’re looking for a Staff Analytics Engineer to build and own our Financial Subledger Data Platform —...  ...expertise: architecture patterns (micro‑partitioning/clustering, query optimization), security/governance (RBAC, masking policies... 
    Work at office
    Remote work
    Flexible hours

    Affirm

    San Francisco, CA
    1 day ago
  • Crusoe Energy Systems in Sunnyvale is looking for a Staff Network Deployment Engineer to lead the deployment of network infrastructure across data centers. The role involves managing technical implementations and ensuring compliance with high-performance standards. Ideal... 

    Crusoe Energy Systems

    San Francisco, CA
    5 days ago
  • B Capital is looking for a Staff Software Engineer to join the Data Infrastructure team. The role focuses on building secure and scalable data infrastructure for Slack’s analytics and decision-making. Key responsibilities include designing data services, ensuring reliability... 

    B Capital

    San Francisco, CA
    1 day ago
  • Slack Enterprise seeks a Staff Software Engineer to join its Data Infrastructure team. This role includes designing and building high-performance data systems that support analytics and machine learning needs. Candidates should have over 10 years of experience in software... 

    Slack Enterprise

    San Francisco, CA
    3 days ago
  •  ...superintelligence. To achieve this, we need more great engineers. The work affects millions of people...  ...The Role We're looking for a frontend infrastructure engineer to build the tools and systems...  ...that scale as the codebase grows As a staff engineer, you'll make decisions about... 

    Giga

    San Francisco, CA
    4 days ago
  • 100 Salesforce, Inc. is looking for a Staff Software Engineer to join the Data Infrastructure team. This role involves designing and operating reliable, scalable data infrastructure that supports analytics and machine learning workflows. The ideal candidate will have 10... 

    100 Salesforce, Inc.

    San Francisco, CA
    1 day ago
  • Crusoe in San Francisco is looking for a Senior Staff Network Operations Engineer to oversee the reliability of its global network. This role entails...  ...a team of engineers in maintaining a high-performing infrastructure. The ideal candidate will have over 12 years of... 

    ProducePay

    San Francisco, CA
    2 days ago
  • $195k - $235k

    Crusoe Energy Systems LLC is looking for a Staff Network Operations Engineer to ensure production reliability across its global network infrastructure. This role is critical in maintaining uptime and facilitating AI workloads via incident response and operational excellence... 

    Crusoe Energy Systems LLC

    San Francisco, CA
    14 hours ago
  • $225k - $275k

    Crusoe Energy Systems LLC in San Francisco is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network. In this role, you will lead incident response and define key operational standards. Ideal candidates will bring... 

    Crusoe Energy Systems LLC

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Infrastructure Engineer, Cluster Infrastructure. Be the first to apply!