Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Network Engineer, Capacity and Efficiency

United States Digital Space LLC

About the company the company’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the team The Capacity & Efficiency team sits inside the company’s Compute organization and owns the cost, utilization, and attribution story for non-accelerator infrastructure — the network, compute, and storage backbone that moves petabytes between training clusters, inference fleets, and object storage across clouds and regions. The scale is real, the spend is large, and the efficiency levers are still mostly unpulled. We work alongside the Systems Networking team (who build and operate the fabric) and the Observability team (who own the telemetry platform). This role lives at the intersection: you’ll use deep networking knowledge and rigorous measurement to figure out where and how bandwidth, latency, and dollars are being used, find optimization opportunities and land them. About the role We’re looking for a network engineer who thinks in metrics first. You understand spine-leaf fabrics, BGP, SDN overlays, and cloud interconnect products well enough to build them. You will instrument them, model their cost-per-bit, and squeeze out the inefficiency, while ensuring we can move the bits to the right places in the most efficient manner. You’ll own the observability and efficiency surface for the company’s network: from per-flow telemetry on backbone routers, to cost attribution that tells a research team exactly what their checkpoint sync is costing. This is a hands‑on IC role. You’ll write code (Python, Go), build dashboards and model capacity. You’ll also influence architecture: when the data says a traffic pattern is pathological, you’ll be in the room root causing it and fixing it. You will be working across three areas: network telemetry, observability and cost modeling and attribution. We expect you to be strong in at least two and willing to grow into the third. If you're a telemetry‑first engineer who's never built a chargeback model, or a traffic engineer who hasn't shipped eBPF probes, apply anyway and tell us which axis you want to grow on. What you’ll do Build the network observability stack. Design and deploy telemetry pipelines — sFlow/IPFIX, gNMI streaming, eBPF host probes — that turn packet counters into per‑flow, per‑tenant, per‑workload cost and utilization data. Own the SLIs for backbone and DCN fabric health. Hunt for efficiency. Analyze inter‑region traffic patterns, identify hot links and stranded capacity, and quantify the dollar impact. Build the models that tell us whether we should buy more capacity, or move the workload. Own QoS and traffic engineering. Design and operate traffic classification, marking, and shaping across the backbone. Make sure bulk checkpoint transfers don’t starve latency‑sensitive inference, and that we’re not paying premium cross‑region rates for traffic that could take the cheap path. Drive cost attribution. Tie network spend — egress, interconnect ports, transit, optical leases — back to the teams and workloads that generate it. Make network cost a first‑class input to capacity planning and workload placement decisions. Influence decisions you don't own. A large fraction of this role is convincing other teams to act on what your data shows: making the case to research that a traffic pattern needs to change, to finance that an interconnect tranche is worth buying, to Systems Networking that a QoS policy needs rewriting. You'll partner closely with Systems Networking on fabric architecture and Observability on telemetry platform integration, but the cost and efficiency wins will come from moving teams that don't report to you. Automate. Extend our intent‑based network configuration systems and write the tooling that turns your efficiency findings into safe, reviewable, and impactful changes. You may be a good fit if you Have 5+ years operating large‑scale production networks — data center fabrics (spine‑leaf, Clos), backbone/WAN, or hyperscaler‑adjacent environments. Are genuinely fluent across the stack: BGP (including policy and communities), ECMP, VXLAN/EVPN or equivalent overlays, QoS (DSCP, queuing, shaping), and L1/optical basics (DWDM, coherent, LAGs). Know at least one major CSP’s networking model deeply — AWS (VPC, TGW, Direct Connect, Gateway Load Balancer) or GCP (Shared VPC, Interconnect, Cloud Router, Network Connectivity Center) — and understand how their overlays interact with physical underlays. Have built or operated network telemetry at scale: streaming telemetry (gNMI/OpenConfig), flow export (sFlow, IPFIX, NetFlow), or eBPF‑based host‑side instrumentation. You can reason about sampling, cardinality, and storage tradeoffs. Comfortable writing Python or Go to build tooling, telemetry pipelines, infrastructure‑as‑code, config management for network devices and automation, that you’ll ship to production. Think quantitatively by default. You reach for a notebook or a Grafana query before you reach for an opinion, and you can turn messy counter data into a defensible cost model. Communicate crisply. You can explain to a finance partner why a 10% egress reduction matters, and to a network engineer why a specific ECMP imbalance is costing real money. Strong candidates may also have SRE experience for large‑scale network infrastructure — designing for reliability, defining SLOs/SLIs for network services, capacity planning with error budgets, and incident response for network‑impacting outages at scale. Background on a cloud provider's networking team or a cloud networking product team — building or operating the interconnect, backbone, or SDN control plane from the provider side, not just consuming it as a customer. Familiarity with AI/ML infrastructure traffic patterns like collective communication (all‑reduce, all‑gather), checkpoint/weight transfer, inference serving, and how these stress networks differ than traditional workloads in terms of burst behavior, flow synchronization, and bandwidth symmetry. Experience with HPC fabrics like InfiniBand, RoCE v2, lossless Ethernet, or custom high‑radix topologies and an understanding of how job placement, congestion management, and adaptive routing interact at scale. Background in traffic engineering for large backbones and the operational judgment to know when TE is worth the complexity. Hands‑on time with multi‑cloud connectivity: cross‑cloud peering, private interconnect products, and the billing models that come with them. Experience building cost/chargeback systems for shared infrastructure, or FinOps exposure in a large cloud environment. Representative projects Build a per‑flow cost attribution pipeline that traces every byte of cross‑region egress back to the team and workload that generated it. Design QoS policy for the private backbone that prevents bulk checkpoint transfers from starving inference traffic. #J-18808-Ljbffr United States Digital Space LLC

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Network Engineer, Capacity and Efficiency in San Francisco, CA vacancy
  •  ...or Recruiters. SUMMARY OF POSITION The Network Engineer is responsible for maintaining,...  ...Utilize link aggregation techniques for efficient utilization and redundancy of network links...  ...regular network performance monitoring and capacity planning to ensure optimal network... 
    Suggested
    Full time
    Work at office
    Immediate start
    Remote work

    InfoIMAGE

    San Francisco, CA
    3 days ago
  •  ...lifecycle spanning infrastructure strategy, capacity planning, provider partnerships, fleet...  ...are seeking a Capacity Systems Software Engineer to build the platforms and services that...  ...utilization, improved operational efficiency, and better decision‑making across the company... 
    Suggested
    Work at office
    3 days per week

    United States Digital Space LLC

    San Francisco, CA
    3 days ago
  •  ...and help build the platform engineers turn to ship AI products. THE ROLE We’re hiring a Capacity and Infrastructure Analytics...  ...answer questions like: How efficiently are we using committed capacity...  ...offering unparalleled learning and networking opportunities. Apply now to... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    3 days ago
  • $121.5k - $233.8k

     ...world. The opportunity The Global Network Engineering Lead is a strategic leadership role...  ...balancing modernization, risk mitigation, capacity growth, and cost optimization Drive...  ...design patterns to improve efficiency, consistency, and change reliability... 
    Suggested
    Work experience placement
    Summer holiday
    Work at office
    Local area
    Remote work
    Flexible hours

    EY

    San Francisco, CA
    5 days ago
  •  ...house and leading frontier models to power efficient and accurate document workflows. We’ve...  ...Opportunity As the Founding Solutions Engineering Manager at Reducto, you’ll build and...  ...make a technical team run—forecasting, capacity planning, account coverage, POC frameworks... 
    Suggested
    Work at office
    Local area

    Reducto

    San Francisco, CA
    4 days ago
  •  ...The Network Engineer The Network Engineer designs, implements, and maintains the organization’s network infrastructure across physical, wireless...  ...and cloud‑managed wireless systems; ensure RF coverage, capacity, and security standards are met. Design and maintain Azure network... 

    The Bridgespan Group

    San Francisco, CA
    3 days ago
  • $250k - $320k

     ...hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without...  ...the future of AI. About This Role Gimlet Labs is seeking a Network Engineer to design, build, and scale the network infrastructure powering... 

    Gimlet Labs, Inc.

    San Francisco, CA
    4 days ago
  • United States Digital Space LLC is looking for a Capacity Systems Software Engineer based in San Francisco. This role involves building platforms to facilitate operational decision-making in Industrial Compute, ensuring data visibility and workflow automation. The ideal... 
    Work at office

    United States Digital Space LLC

    San Francisco, CA
    3 days ago
  • $224k - $284k

     ...far less intelligent, far less efficient, and far more constrained. We...  ...scale. We are roboticists, engineers, operators, and builders. We...  ...and scale the high‑performance network fabric that interconnects our...  ...workloads fed and fast. Own capacity planning and lead network... 
    Full time
    Work at office
    Immediate start
    Flexible hours

    ATOMS Careers page

    San Francisco, CA
    2 days ago
  •  ...About Nscale Nscale is the GPU cloud engineered for AI. We provide cost‑effective, high...  ...powers the future. About The Role The Network Operations and Engineering teams at...  ...improvements in uptime, latency consistency, capacity efficiency, and incident reduction for front‑end... 

    Nscale

    San Francisco, CA
    2 days ago
  • $156.75k - $200.03k

     ...this opportunity: At Freenome, we are seeking a Senior Network Engineer with experience managing enterprise and data center network...  ...Network Firewalls, Prisma, and Panorama. Experience with efficient iP SOLIDserver DDI. Engineering networks in GCP or other... 
    Local area

    Freenome

    Brisbane, CA
    1 day ago
  • $150k - $215k

    Principal Back-End Network Engineer - AI Infrastructure Operations US About Nscale Nscale is the GPU cloud engineered for AI. We provide...  ...measurable improvements in uptime, latency consistency, capacity efficiency, and incident reduction About You (Skills /... 
    Flexible hours

    Nscale

    San Francisco, CA
    3 days ago
  • $190k - $250k

     ...Senior Network Engineer San Francisco About the Role As a Senior Network Engineer at Together, you are responsible for designing...  ...systems and tools to keep all networks running reliably and efficiently Establish and implement industry best practices and contribute... 
    Full time

    Together AI

    San Francisco, CA
    2 days ago
  • $181.1k - $318.4k

    Cloud Infrastructure and AI Efficiency Engineer San Francisco, California, United States | Software and Services Description Join the Cloud...  ...ability to build and communicate AI‑augmented models that inform capacity planning and financial strategy. Deep familiarity with... 
    Relocation
    Shift work

    Apple

    San Francisco, CA
    2 days ago
  •  ...Job Description: Senior Network Engineer Contract Type: 6 months, with potential extension...  ..., prioritize, and manage time efficiently; and Architects, documents, implements...  ...multiple vendor and network environments, capacity planning and architecture (Such as BGP... 
    Contract work
    Temporary work
    Local area
    Remote work

    Talantage

    Oakland, CA
    18 hours ago
  • $172.5k - $260.1k

    ## Lead Network Engineer - Backbone EngineeringApplyremote type: Office Tech-Flexiblelocations...  ...technologists who thrive on efficiency by contributing to the development of...  ...Engineering team is responsible for network capacity and connectivity strategy for the Salesforce... 
    Work at office
    Shift work

    Salesforce, Inc.

    San Francisco, CA
    1 day ago
  • Apple Inc. is looking for a Cloud Infrastructure and AI Efficiency Engineer in San Francisco, California. The role involves driving insights for cloud resource planning and collaborating with multiple teams to enhance efficiency across various services. With 5+ years of... 

    Apple Inc.

    San Francisco, CA
    2 days ago
  • About the Team The Core Network Engineering team owns the end-to-end networking stack that connects...  ...the bottleneck to model training efficiency, cluster reliability, or fleet expansion...  ...architecture decisions around topology design, capacity planning, failure domains, and network... 

    United States Digital Space LLC

    San Francisco, CA
    4 days ago
  • Blacksmith Software Inc. is seeking an Infrastructure Capacity Planner in San Francisco. This role involves owning capacity planning for infrastructure needs, developing forecasting models, and managing supplier relationships to ensure scalability. The ideal candidate... 

    Blacksmith Software Inc.

    San Francisco, CA
    4 days ago
  • $60 - $64 per hour

     ...Infrastructure Network Engineer – Contract Akkodis is seeking a – “ Infrastructure Network Engineer ” for a Contract position with a...  ...) to determine optimal access point placement, coverage, and capacity. Perform in‑depth RF spectrum analysis to identify and mitigate... 
    Contract work
    Temporary work
    Local area

    Akkodis

    San Francisco, CA
    3 days ago
  • $200k - $250k

     ...solution that delivers high capacity, exceptional user experiences...  ...Glydways system is a groundbreaking network of carbon-neutral,...  ..., it offers personalized and efficient mobility—without the burden of...  ...Autonomy, and Infrastructure engineers to go the last mile in getting... 
    Local area

    Glydways

    San Francisco, CA
    18 hours ago
  •  ...design, implementation and ongoing management of the corporate network Interface with InfoSec teams to design and continuously deliver security-related enhancements Analyze data and define network capacity models and performance metrics Develop automated methods to mitigate... 
    Local area

    Sonoma Consulting

    San Francisco, CA
    4 days ago
  •  ...infrastructure company in San Francisco seeks an experienced network engineer to optimize high-performance networking protocols for AI...  ...integrate RDMA and InfiniBand into the inference stack, ensuring efficient communication and low latency. A deep understanding of... 

    Baseten

    San Francisco, CA
    4 days ago
  • $110k - $135k

     ...improve this process, we will expand our capacity to onboard more customers in a shorter...  ...sales handoff to full implementation efficiently, working with them 1:1 be their primary...  ...arise, working with our product and engineering teams if new features are needed proactively... 
    Odd job
    Full time
    H1b
    Work at office
    Remote work
    Visa sponsorship
    Relocation package

    Agave

    San Francisco, CA
    3 days ago
  •  ...Solutions Engineer BackOps is at the forefront of supply chain intelligence, leveraging...  ...empower logistics teams to maximize efficiency and focus on mission-critical initiatives...  ..., reprioritize, and protect engineering capacity. Executive-Level Communication: Lead... 
    Work at office

    BackOps

    San Francisco, CA
    1 day ago
  • $199k - $273.9k

     ...technical mentor for internal analysts, engineers, and functional leads, and advise HR, IT...  ...Okta's "Door 2" strategy to drive internal efficiency through AI adoption What you'll...  ...Solutions Architect or Enterprise Architect capacity supporting HR/People systems ~ Deep... 
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    4 days ago
  • $180k

     ...technical conversations with customer engineering teams Functional Solution Design...  ...recognition, project margin analysis, capacity planning Data Migration & Transition...  ...and enterprise data flows ~ Ability to efficiently leverage modern AI tools (e.g., MCPs Claude... 
    Immediate start
    Remote work

    Blueprint

    San Francisco, CA
    3 days ago
  • $320k - $405k

     ...growing group of committed researchers, engineers, policy experts, and business leaders working...  .... We're looking for a Staff Fiber Network Engineer to own the physical layer of this...  ...routing team, and our data center and capacity planning teams. Key responsibilities Route... 
    Visa sponsorship
    Night shift

    United States Digital Space LLC

    San Francisco, CA
    2 days ago
  •  ...security demands — and we're looking for an engineer who wants to own the foundation it runs...  ...You will design and operate the global network and reliability layer behind one of the...  ..., observability, and proactive capacity planning. Implement and manage complex... 

    Mxv

    San Francisco, CA
    2 days ago
  •  ...Networking Engineer Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products...  ...: Improve data center network availability and efficiency through software development Develop network monitoring... 

    Adapt Talent

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Network Engineer, Capacity and Efficiency. Be the first to apply!