Staff Infrastructure Engineer, Cluster Infrastructure
$320k - $405kUnited States Digital Space LLC
About the company The company’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role The company's Infrastructure organization is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand. Cluster Infra owns the full lifecycle of compute clusters at the company. We build agent-driven automation for cluster provisioning and lifecycle management across all major cloud providers and our own datacenters. Our systems stand up clusters that are interconnected with high bandwidth, secure-by-default, and able to automatically drain and recover in response to failure. As a Staff engineer on this team, you'll set the technical direction for how the company brings compute online - at a moment when the scale of that compute is growing faster than at almost any company in the world. Key responsibilities Own the technical strategy and roadmap for agent-driven cluster lifecycle management - provisioning, updates and decommissioning Partner across teams to ensure new compute capacity is ingested on time Align with partner teams on physical build-out and leverage cloud solutions to deliver high-bandwidth inter-cluster connectivity Collaborate with security owners to ensure clusters are provisioned secure-by-default Define and drive strategy on cluster scalability, homogeneity and fault tolerance Work closely with cloud providers and internal research, inference and product teams to shape long-term compute, data, and infrastructure strategy Establish and evolve operational-excellence practices: incident response, postmortem culture and on-call health Support the growth of engineers around you through technical mentorship and coaching Minimum qualifications Deep expertise in distributed systems, reliability, and cloud platforms (e.g., Kubernetes, IaC, AWS/GCP/Azure) Strong proficiency in at least one systems language (e.g., Rust, Go, or Python), IaC proficiency with Terraform. Track record of leading complex, multi-quarter technical initiatives spanning multiple teams or systems Ability to build alignment across senior stakeholders and communicate effectively at all levels Preferred qualifications 8+ years of software engineering experience, including time as a technical lead setting direction for a team Experience operating large-scale compute infrastructure at hyperscale (100+ clusters, 10K+ nodes) Depth in one or more of: Kubernetes internals, cluster provisioning and management systems, cluster orchestration systems (Mesos, Borg-like) Experience with cloud networking: VPC design and peering, Shared VPC/Transit Gateway, Cloud Interconnect/Direct Connect, Cloud NAT, cross-cloud private connectivity, BGP and route control, edge load balancing and DDoS mitigation (Cloud Armor / AWS Shield) Experience with cluster and host networking: CNI (Cilium), eBPF, NetworkPolicy, multi-NIC, sFlow, service mesh (Istio/Envoy/Linkerd, mTLS) Experience with cluster security: pod security standards and admission control, RBAC and least-privilege IAM, node and container hardening, supply-chain/image provenance Deep experience with infrastructure-as-code (Terraform, Atlantis), workflow orchestration (Temporal, Argo Workflows) Skill in quickly understanding systems design tradeoffs and keeping track of rapidly evolving software systems Compensation
$320,000 — $405,000 USD
Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. How we're different We believe that the highest-impact AI research will be big science. At the company we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication. #J-18808-Ljbffr United States Digital Space LLC$224k - $284k
...improve them until they work at scale. We are roboticists, engineers, operators, and builders. We believe the next great... ...world impact, join us. What you’ll do We’re seeking a Cluster Infrastructure Engineer to join our founding team who will own the GPU compute...SuggestedFull timeWork at officeFlexible hours- Staff Infrastructure & Performance Engineer As a Staff Infrastructure Performance & Engineer , you will own and evolve the performance, reliability, and... ...indexing strategies, connection management, replication, cluster design, and failover. Architect and operate multi-...SuggestedFixed term contractFlexible hours
$300 per month
...energy and intelligence. We’re crafting the engine that powers a world where people can... ...for responsible, transformative cloud infrastructure. About this role Crusoe Cloud Network... ...network for High-Performance Compute (HPC) Clusters with GPUs. The ideal individual will be...SuggestedTemporary workWork experience placementWork at office$195k - $235k
...the only vertically integrated AI infrastructure company built from the ground up, we... ...This Role Crusoe Cloud is seeking a Staff Network Operations Engineer to help own production reliability... ...backbone, data center fabric, and GPU cluster interconnects. This is a hands‑on...SuggestedTemporary workWorldwide$193k - $234k
...the only vertically integrated AI infrastructure company built from the ground up, we... ...seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical... ...testing (SAT) for new network clusters, ensuring zero-defect handovers to...SuggestedTemporary workRemote work$160k - $300k
...Databricks, GM, and Character, our mission is to revolutionize how engineering decisions are made, turning complexity into clarity for the... ...company together. About the Role As a Senior / Staff Infrastructure Engineer at Apiphany, you’ll design, build, and operate the...Work at officeVisa sponsorshipFlexible hours$250k - $325k
...Francisco, CA Employment Type Full time Department Engineering Compensation $250K - $325K We're building the company which will de-risk the largest infrastructure build‑out in history. When people finance GPU clusters, the datacenters housing them, and the...Long term contractFull timeContract workFixed term contractWork at officeLocal areaVisa sponsorshipShift work3 days per week$193k - $234k
...the only vertically integrated AI infrastructure company built from the ground up, we... ...seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical... ...networks for GPU compute clusters. As we rapidly expand our footprint...Temporary workWork at officeRemote work$224k - $284k
...they work at scale. We are roboticists, engineers, operators, and builders. We believe... ...and the architecture to scale it as our cluster grows. Design, optimize, and scale the... ...efficiency. Collaborate across the infrastructure team to solve cross-discipline problems...Full timeWork at officeImmediate startFlexible hours- ...Role Abridge’s services and engineering teams are in hyperscale mode... ...are looking for experienced Staff Platform Engineers to join our... ...and help scale our cloud infrastructure, developer platform, and operational... ...upgrades, and multi-tenant cluster design. Experience designing...Hourly payFull timeLocal areaRemote workFlexible hours
- ...About Us We’re building the AI infrastructure powering the future of financial operations - starting with automating the most... ...performance matter most. About the Role We're looking for a Staff Infrastructure Engineer to architect and own the systems that power Salient at...Full timeWork at office
$276.5k - $300k
...be owned by everyone. About the Team Our Infrastructure team is a collaborative group of experienced engineers dedicated to supporting the World project's mission... ...About the Opportunity We are looking for a Staff Infrastructure Engineer to help establish our team...Flexible hours- ...This role is infrastructure-first, with a second gear in backend or QA. Hamilton is building the operating system for charter aviation... ..., and resilient. That's your job. We're hiring a Staff Platform Engineer to own the infrastructure and internal platforms that let...Second jobVisa sponsorship
- Gimlet is building AI infrastructure and orchestration platforms for large-scale AI datacenters. This Infrastructure/Cluster Engineer role involves designing, building, and operating heterogeneous cluster infrastructure that intelligently routes workloads across diverse...
$300 per month
...intelligence. We’re crafting the engine that powers a world where people can... ...responsible, transformative cloud infrastructure. About the Role As a Senior Staff Cloud Support Engineer , you are a... ...(Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident...Full timeTemporary work$300 per month
...and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate... ...with us at Crusoe. About the Role: We are seeking a Staff Software Infrastructure Engineer to play a critical role in managing Crusoe’s fleet operations...Temporary work- A leading AI infrastructure company is seeking a Staff Infrastructure Engineer in San Francisco. In this role, you will own the systems that power the company at scale, focusing on reliability, scalability, and developer velocity. You will be responsible for designing cloud...Work at office
- We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes... ..., and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely... ...large-scale AI training and inference clusters Responsibilities Design, deploy, and maintain...
$178k - $267k
...Ireland. Come join us! About the Team Our diverse Product & Engineering team values innovation, collaboration, and the continuous improvement... ...software engineer to join our AI & Search mission. This cluster of teams is responsible for developing the next‑generation Generative...Local area- What you’ll do As a Senior / Staff Network Engineer, you will define the global technical strategy, architecture, and roadmap for Airwallex’s enterprise and cloud network infrastructure. You will design and deploy highly secure, multi-region hybrid network patterns that...Flexible hoursWeekend work
$320k - $405k
...growing group of committed researchers, engineers, policy experts, and business leaders... ...systems and routers. We're looking for a Staff Fiber Network Engineer to own the... ...and wavelength options from carriers and infrastructure providers. Run RFPs, compare bids on cost...Visa sponsorshipNight shift$215k - $265k
...controls, and automation across the org. We’re looking for a Staff Analytics Engineer to build and own our Financial Subledger Data Platform —... ...expertise: architecture patterns (micro‑partitioning/clustering, query optimization), security/governance (RBAC, masking policies...Work at officeRemote workFlexible hours- Crusoe Energy Systems in Sunnyvale is looking for a Staff Network Deployment Engineer to lead the deployment of network infrastructure across data centers. The role involves managing technical implementations and ensuring compliance with high-performance standards. Ideal...
- B Capital is looking for a Staff Software Engineer to join the Data Infrastructure team. The role focuses on building secure and scalable data infrastructure for Slack’s analytics and decision-making. Key responsibilities include designing data services, ensuring reliability...
- Slack Enterprise seeks a Staff Software Engineer to join its Data Infrastructure team. This role includes designing and building high-performance data systems that support analytics and machine learning needs. Candidates should have over 10 years of experience in software...
- ...superintelligence. To achieve this, we need more great engineers. The work affects millions of people... ...The Role We're looking for a frontend infrastructure engineer to build the tools and systems... ...that scale as the codebase grows As a staff engineer, you'll make decisions about...
- 100 Salesforce, Inc. is looking for a Staff Software Engineer to join the Data Infrastructure team. This role involves designing and operating reliable, scalable data infrastructure that supports analytics and machine learning workflows. The ideal candidate will have 10...
- Crusoe in San Francisco is looking for a Senior Staff Network Operations Engineer to oversee the reliability of its global network. This role entails... ...a team of engineers in maintaining a high-performing infrastructure. The ideal candidate will have over 12 years of...
$195k - $235k
Crusoe Energy Systems LLC is looking for a Staff Network Operations Engineer to ensure production reliability across its global network infrastructure. This role is critical in maintaining uptime and facilitating AI workloads via incident response and operational excellence...$225k - $275k
Crusoe Energy Systems LLC in San Francisco is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network. In this role, you will lead incident response and define key operational standards. Ideal candidates will bring...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Infrastructure Engineer, Cluster Infrastructure. Be the first to apply!
- research assistant engineering San Francisco, CA
- staff security engineer San Francisco, CA
- assistant mechanical engineer San Francisco, CA
- staff engineer San Francisco, CA
- assistant chief engineer San Francisco, CA
- senior staff systems engineer San Francisco, CA
- assistant engineering manager San Francisco, CA
- project engineer assistant project manager San Francisco, CA
- staff automation engineer San Francisco, CA
- engineering aide San Francisco, CA


