Staff Network Engineer, Operations
$195k - $235kCrusoe
Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role: Crusoe Cloud is seeking a Staff Network Operations Engineer to help own production reliability across our global network infrastructure, including edge, backbone, data center fabric, and GPU cluster interconnects. This is a hands-on production ownership role focused on incident response, root cause analysis, and operational excellence initiatives that keep our hyperscale AI infrastructure running at scale. Your work will directly affect the availability of AI workloads running across thousands of GPUs worldwide. The ideal candidate is a seasoned network engineer with deep operational experience in large-scale environments who thrives in high-pressure situations and takes pride in keeping systems healthy. You'll contribute to defining SLIs and SLOs, improving observability tooling, building automation to reduce toil, and mentoring peers - all while serving as a key escalation point during high-severity network events. What You'll Be Working On:- Production Reliability: Help own uptime across Crusoe's global edge, backbone, data center, and GPU cluster network, directly supporting AI workloads at scale.
- Incident Response: Lead and contribute to end-to-end response for high-severity network events, including mitigation, stakeholder communication, and postmortem documentation.
- Root Cause Analysis: Drive RCAs for production incidents, identify systemic issues, and author remediation plans tracked through to closure.
- Observability Improvements: Contribute to and improve Crusoe's network monitoring stack using streaming telemetry, SNMP, NetFlow, and tools such as Kentik, Grafana, Prometheus, and ThousandEyes.
- Operational Standards: Author and maintain runbooks, escalation playbooks, and SOPs used across the operations team.
- Operational Automation: Write Python-based tooling to reduce toil, automate common remediation workflows, and accelerate mean time to resolution.
- SLI/SLO Contribution: Partner with Architecture and SRE teams to define and track network reliability metrics and service level objectives backed by real-time dashboards.
- Mentorship: Provide technical guidance to Senior engineers and contribute to a culture of operational excellence and continuous learning.
- 8+ years of production network engineering experience with a focus on operations, incident response, and reliability in large-scale or internet-scale environments.
- Hands-on experience with observability and monitoring tools including streaming telemetry, SNMP, NetFlow/sFlow, Grafana, Prometheus, and ThousandEyes.
- Experience operating RDMA/RoCE lossless fabrics for GPU or HPC workloads, including familiarity with PFC, ECN, and DCQCN tuning.
- Expert hands-on knowledge of BGP, EVPN-VXLAN, IS-IS, OSPF, MPLS, QoS, and TCP/IP in production data center environments.
- Proficiency with Arista (EOS) and Juniper (Junos) platforms in leaf-spine CLOS architectures across multi-vendor environments.
- Python proficiency for writing auto-remediation scripts, diagnostic tooling, and operational automation.
- Comfort operating large device fleets across multi-region environments with on-call responsibility, including experience as an escalation point during critical events.
- Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience.
- Experience with NVIDIA/Mellanox networking platforms in GPU cluster environments.
- Familiarity with Kentik or Arbor for traffic analysis and DDoS visibility.
- Experience defining or contributing to SLIs and SLOs in partnership with SRE or product teams.
- Exposure to operating 10K+ device fleets across hyperscale or cloud environments.
- Background contributing to post-incident learning programs or operational excellence initiatives org-wide.
- Competitive compensation and equity packages
- Restricted Stock Units
- Paid time off, paid holidays & leave of absence programs
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off
- Global travel insurance & emergency assistance
- Daily meals allowance
- Additional perks & programs specific to location
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Staff Network Engineer, Operations in San Francisco, CA vacancy
- ...Senior / Staff Network Engineer As a Senior / Staff Network Engineer, you will define the global technical strategy, architecture, and roadmap... ...network issues on endpoints, including basic IT operations support, building a core networking understanding among the...OperationsFlexible hoursWeekend work
$320k - $405k
...Staff Fiber Network Engineer San Francisco, CA | New York City, NY About Anthropic Anthropic's mission is to create reliable, interpretable... ...roadmap. Monitor degradation and quality over time Operations — Partner with NOC and field-ops on fiber cuts, locates,...OperationsWork at officeVisa sponsorshipFlexible hoursNight shift$193k - $234k
...infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to... ...Crusoe Cloud is seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical implementation of our global...OperationsTemporary workRemote work$225k - $275k
...infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to... .... About this Role Crusoe Cloud is seeking a Senior Staff Network Deployment Engineer to serve as the technical owner of how we deploy network...OperationsTemporary workRemote work$210k - $230k
...during the interview process. About the Role: We're looking for a Senior Staff Network Secruity Engineer to lead Gusto's edge and network security strategy, owning the design and operation of our Cloudflare WAF, DDoS protection, Zero Trust, and broader perimeter...OperationsFull timeWork at officeLocal areaRemote work2 days per week3 days per week- Crusoe is seeking a Senior Staff Network Operations Engineer to ensure the reliability of their global network infrastructure. This role focuses on operational excellence, driving incident responses, and mentoring staff engineers. The ideal candidate should have over 12...Operations
$195k - $235k
Crusoe Energy Systems LLC is looking for a Staff Network Operations Engineer to ensure production reliability across its global network infrastructure. This role is critical in maintaining uptime and facilitating AI workloads via incident response and operational excellence...Operations- Epoch Biodesign is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network in San Francisco. This role drives incident response and sets operational standards for Crusoe's extensive AI infrastructure, requiring strong...Operations
$225k - $275k
Crusoe Energy Systems LLC in San Francisco is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network. In this role, you will lead incident response and define key operational standards. Ideal candidates will bring...Operations$193k - $234k
...infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to... ...: Crusoe Cloud is seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical implementation of our...OperationsTemporary workWork at officeRemote work- Epoch Biodesign in San Francisco is looking for a Staff Network Operations Engineer to enhance the reliability of their global network infrastructure. This role demands a seasoned network engineer to handle production incidents, maintain high system availability, and optimize...Operations
$195k - $235k
Crusoe is seeking a Staff Network Operations Engineer in San Francisco to ensure the reliability of our network infrastructure across AI workloads. This hands-on role involves managing uptime, incident response, and operational excellence. Ideal candidates will have 8+...Operations$150k - $250k
...stack. We acquire power, design and build data centers, and operate them - with teams spanning hardware and software. Speed and scale... ...is you, please apply! About the Role Fluidstack is seeking a Network Engineer, Reliability & Observability to serve as a reliability...OperationsLocal area$245k - $295k
...Location Type On-site Department Cloud Engineering Crusoe is on a mission to accelerate... ...company built from the ground up, we own and operate each layer of the stack — from... ...About this Role Crusoe is seeking a Senior Staff Network Automation Engineer to own how our network...OperationsFull timeTemporary work$171k - $248k
...degree in Computer Science, Electrical Engineering, or a technical field, or equivalent... ...of experience with data center networking architecture, operations, and power distribution systems within... ...Technical Program Managers or technical staff, and managing cross‑functional teams...OperationsFull timeWork at officeRemote work- Junior Network Engineer job at Revel Staffing. San Francisco, CA. Key Responsibilities Firewall Operations & Security Provide daily operational support of enterprise firewalls, including configuration, troubleshooting, and proactive monitoring. Manage firewall rules...OperationsWork experience placement
$340 per month
...Developer Productivity , you will lead the development and operations of a robust AWS and Kubernetes-based platform. This platform... ...Collaboration and Application Onboarding: Partner with product and engineering teams to understand new backend applications and identify...OperationsImmediate startHome officeFlexible hours- ...team's overall planning. Represent engineering in cross-functional team sessions and... ...surrounding systems. Assist support and operations teams in identifying and quickly... ...in AWS foundations including compute, networking, storage, observability and security. Experience...OperationsContract workWork experience placementWork at officeVisa sponsorshipWork visa
- Drata, based in San Francisco, is seeking a Staff Software Engineer for their Monetization Platform. This role involves leading the architecture... ...and evolution of billing systems that enhance financial operations and support diverse pricing models. The ideal candidate...Operations
- IBM Computing in San Francisco is looking for a Software Engineer to lead the Compute Platform team, focusing on cloud-native architectures... ...role entails designing platform APIs, managing multi-cluster operations, and ensuring system reliability for enterprise clients. The...Operations
- ...platform that enables users to launch LoRA and fine-tuning runs on managed GPU clusters. Ideal candidates will have strong Kubernetes operations, backend development in Python, and a solid understanding of AI technologies. With a flexible work arrangement and a focus on...OperationsFlexible hours
$204k - $233k
...Staff DevOps Engineer San Francisco, CA (Hybrid) | Full-Time We're partnering with a well-capitalized infrastructure technology... ...systems to high-availability infrastructure powering physical operations. They're looking for a Staff DevOps Engineer to...OperationsFull timeLocal area$150k - $300k
Prime Intellect is seeking engineers to build its AI training platform, which allows launching managed GPU jobs effortlessly. Responsibilities... ...stacks. Candidates should be fluent in Kubernetes operations, Python backend development, and AI stack knowledge. We offer...OperationsVisa sponsorshipWork visaFlexible hours$189k - $274k
...efficient and accessible for all. We're searching for a Staff Security Platform Engineer to join our Enterprise Security Engineering team,... ...Engineering. Aurora is scaling its autonomous trucking operations, and we need someone who makes our security tools...OperationsWork at officeLocal area3 days per week- A technology company is seeking a Staff Engineer to lead technical direction and drive impactful projects in the development of software systems for physical operations. This remote position is ideal for candidates with significant experience in software design and a growth...OperationsRemote jobFlexible hours
- ...with a second gear in backend or QA. Hamilton is building the operating system for charter aviation. Quoting, trip planning, live... ..., and resilient. That's your job. We're hiring a Staff Platform Engineer to own the infrastructure and internal platforms that let Hamilton...OperationsSecond jobVisa sponsorship
- ...Role Abridge’s services and engineering teams are in hyperscale mode... ...are looking for experienced Staff Platform Engineers to join our... ..., developer platform, and operational maturity in kind. You’ll work... ...platforms including networking, IAM, Kubernetes, databases,...OperationsHourly payFull timeLocal areaRemote workFlexible hours
- Golunar, based in San Francisco, is seeking a Staff Software Engineer to tackle complex technical challenges in healthcare. You will design... ...modern, AI-powered software systems that improve hospital operations and patient care. The ideal candidate will have over 10 years...Operations
- Plenful Inc. is seeking a Staff Software Engineer to lead the data platform development. This role involves architecting the core data model... ...particularly in Python. Join us in transforming healthcare operations while enjoying comprehensive benefits and a hybrid work...Operations
$200.2k - $357.5k
Samsara (NYSE: IOT) is the pioneer of the Connected Operations™ Cloud, which is a platform that enables... ...leader in AI for physical operations. We’re hiring a Staff / Senior Staff Machine Learning Infrastructure Engineer to lead the design and evolution of our end-to-end...OperationsRemote jobWork at officeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Network Engineer, Operations. Be the first to apply!
Related searches
- software engineer staff San Francisco, CA
- assistant engineer San Francisco, CA
- assistant engineering manager San Francisco, CA
- staff design engineer San Francisco, CA
- project engineer assistant project manager San Francisco, CA
- technology administrator San Francisco, CA
- staff data engineer San Francisco, CA
- assistant chief engineer San Francisco, CA
- senior staff systems engineer San Francisco, CA
- staff engineer San Francisco, CA

