Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Network Reliability Engineer - Scale & Incident Response

$195k - $235k

Crusoe Energy Systems LLC

Crusoe Energy Systems LLC is looking for a Staff Network Operations Engineer to ensure production reliability across its global network infrastructure. This role is critical in maintaining uptime and facilitating AI workloads via incident response and operational excellence. The ideal candidate has 8+ years of experience in network engineering, specializing in operations and incident response. You'll work with advanced monitoring tools and help shape the future of AI infrastructure. Compensation ranges from $195,000 to $235,000, plus bonuses and stock options. #J-18808-Ljbffr

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Staff Network Reliability Engineer - Scale & Incident Response in San Francisco, CA vacancy
  • $200k - $240k

    A leading AI startup in San Francisco is seeking a Staff Software Engineer to help define the future of incident response by creating an autonomous AI SRE. You will design complex data flows, drive product direction, and maintain high engineering standards across the stack... 
    Suggested

    Jack & Jill/External ATS

    San Francisco, CA
    2 days ago
  • $225k - $275k

     ...Francisco is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network. In this role, you will lead incident response and define key operational...  ...track records in reliability at scale. The position offers competitive... 
    Suggested

    Crusoe Energy Systems LLC

    San Francisco, CA
    2 days ago
  •  ...to perform under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the...  ...design and operate the global network and reliability layer behind...  ...monitoring, alerting, and incident response — SLOs, runbooks, and on-call... 
    Suggested

    Alembic, Inc.

    San Francisco, CA
    3 days ago
  •  ...A leading infrastructure company is seeking a Network Engineer, Reliability & Observability to enhance AI network reliability. This role involves...  ...candidates have over 5 years in networking, strong incident response skills, and experience with data center networks. A... 
    Suggested

    Fluidstack

    San Francisco, CA
    4 days ago
  •  ...Crusoe in San Francisco is looking for a Senior Staff Network Operations Engineer to oversee the reliability of its global network. This role entails leading incident responses, defining operational standards, and guiding a team of engineers in maintaining a high-performing... 
    Suggested

    ProducePay

    San Francisco, CA
    9 hours ago
  • $243k - $284k

     ...P2P is hiring a Senior Incident Response Engineer in San Francisco to lead incident triage and response across AWS and GCP. In this role, you will protect the firm from threats like capital call wire fraud and organized criminal operations. Candidates should have over... 

    P2P

    San Francisco, CA
    9 hours ago
  • $250k - $350k

     ...persistent, and well-resourced anywhere. We are building Detection & Response Engineering from the ground up: engineering-led, agent-first, and built to scale across IT, OT, and physical surfaces. As the Staff Incident Responder, you are the most senior incident commander in the... 
    Contract work
    Local area

    Fluidstack

    San Francisco, CA
    9 hours ago
  • $250k - $350k

     ...spanning hardware and software. Speed and scale are our key differentiators. Come be a...  ...technology in human history, and being responsible for the physical and logical security of...  ...small. Role Scope Run material incidents as incident commander, coordinating... 
    Contract work
    Local area

    Fluidstack

    San Francisco, CA
    3 days ago
  • $182k - $250k

     ...Senior Platform Reliability Engineer Grow Therapy is on a mission to serve as the trusted partner...  ...Engineer to help define and scale reliability as a first-class capability...  ...around observability, SLOs/SLAs, and incident response—while also helping translate those standards... 
    Full time
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours
    Day shift
    3 days per week

    Grow Therapy

    San Francisco, CA
    21 hours ago
  • $200k - $250k

     ...infrastructure to ensure the platform is reliable, fast, and resilient as we scale. Role Mission Own service reliability end-to-end: prevent incidents, reduce blast radius when failures...  ...command quality: Lead Sev1/Sev2 response end-to-end (containment, communications... 
    Permanent employment

    Vizcom

    San Francisco, CA
    3 days ago
  • $200k - $250k

     ...This hands-on technical leadership role demands expertise in service reliability to ensure the platform's performance as it scales. Responsibilities include setting reliability standards, managing incident responses, and driving architectural resilience using Kubernetes... 

    Vizcom

    San Francisco, CA
    3 days ago
  • Founding Platform & Reliability Engineer About OpenArt OpenArt is an AI Storytelling and Visual...  ...real systems, not slices. Ship at real scale, your work goes to millions of users,...  ...in an on-call rotation and lead incident response improvements (alert quality, runbooks... 
    Remote work
    Worldwide
    Visa sponsorship

    Embedding VC

    San Francisco, CA
    5 days ago
  • Overview Senior Platform & Reliability Engineer OpenArt is an AI Storytelling and Visual Creation...  ...systems, notslices. Ship at real scale, your work goes to millions of users,...  ...Participate in an on-call rotation and improve incident response (alert quality, run books, escalation... 
    Remote work
    Worldwide
    Visa sponsorship

    OpenArt AI

    San Francisco, CA
    2 days ago
  • A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform... 

    Speak

    San Francisco, CA
    6 days ago
  •  ...dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production...  ...You will manage production systems' reliability and lead incident response efforts to prevent issues, all while contributing to the... 

    gamma.app

    San Francisco, CA
    6 days ago
  • $202.8k - $327.63k

     ...Director, SRE Platform Engineering is a senior engineering leader responsible for bringing production...  ...Management (ITSM) and Site Reliability Engineering (SRE)...  ...global workforce Evolve incident response into a highly...  ...Developer Platforms (IDP) at scale Background in building... 
    Permanent employment
    Contract work
    Work at office
    Local area
    Remote work
    2 days per week

    DocuSign, Inc.

    San Francisco, CA
    2 days ago
  •  ...A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance... 

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...cloud environments. As we scale, reliability, observability, and security...  .... We’re hiring our first engineer fully dedicated to the...  ...stability monitoring and incident response security and least-privilege...  ...Go, Rust, or C++ Strong networking + security intuition, including... 

    Sieve

    San Francisco, CA
    6 days ago
  • $150k - $170k

     ...looking for an Integration Reliability Engineer to own the...  ...warehouses. This role is responsible for making systems observable...  ...and repeatable as we scale across deployments,...  ...Define and improve incident response, severity...  ...across infrastructure, networking, and distributed... 
    Permanent employment

    Claryo, Inc.

    San Francisco, CA
    9 hours ago
  •  ...SRE to join our engineering team at Plenful and...  ...ownership of the reliability and performance...  ...influence how we build, scale and operate our...  ...solving during incidents and a practical...  ...Lead incident response, coordinate root...  ...across compute, networking and storage. Security... 
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    1 day ago
  • $225k - $275k

     ...Team The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying,...  ...operational leverage as OpenAI scales. About the Role We are...  ..., Identity, and Network teams to ensure...  ...monitoring, alerting, and incident response mechanisms to... 
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    9 hours ago
  • $150k - $250k

     ...As our Founding Security Reliability Engineer at Charta Health, you'll pioneer...  ...opportunity to build and scale the foundational security...  ...mitigation, and efficient incident response. You'll be crucial in engineering...  ...(primarily AWS), including network security, identity and... 

    Charta Health

    San Francisco, CA
    3 days ago
  • $150k - $250k

     ...hardware and software. Speed and scale are our key differentiators....  ...and validate data center network infrastructure (front-end,...  ...ICT, Hardware, and Network Engineering to identify blockers early,...  ...during and after deployments: incident response, troubleshooting, and break-... 
    Local area

    Fluidstack

    San Francisco, CA
    9 hours ago
  •  ...of AI infrastructure: large-scale AI datacenters and the...  ...Gimlet Labs is seeking a Network Engineer to design, build, and scale...  ...operations teams to improve network reliability, deployment velocity,...  ...deployment validation, and incident response workflows. You may be a... 

    Gimlet Labs

    San Francisco, CA
    1 day ago
  •  ...A technology solutions provider is looking for a Network Engineer to enhance and maintain a large-scale network. This role involves managing both wired and wireless infrastructures, conducting assessments, and ensuring network security. Candidates should have a degree... 

    CGS Federal (Contact Government Services)

    San Francisco, CA
    9 hours ago
  • $130k - $160k

     ...YOU WILL DO: As part of the Network, Identity, and Security Team...  ...others, and managing incidents. A typical work week might include...  ...infrastructure that scales and operates efficiently....  ...automate your work. KEY RESPONSIBILITIES: The Network, Identity, and... 
    Work at office
    Remote work

    EOS https://app2.greenhouse.io/job_boards/4008206002/setting...

    San Francisco, CA
    22 hours ago
  •  ...mission is to create reliable, interpretable, and steerable...  ...researchers, engineers, policy experts, and...  ...infrastructure — the network, compute, and storage...  ...clouds and regions. The scale is real, the spend is...  ...error budgets, and incident response for network‑impacting... 

    United States Digital Space LLC

    San Francisco, CA
    4 days ago
  •  ...Senior Database Reliability Engineer Scribe is where exceptional people come to do the best work of their...  ...index builds, NOT VALID constraints), and incident response for the data tier Make the Django ORM a strength at scale: catch N+1 patterns in review, extend... 
    Full time
    Work at office
    Remote work
    Home office
    Flexible hours
    3 days per week

    Scribe

    San Francisco, CA
    3 days ago
  •  ...What you’ll do As a Senior / Staff Network Engineer, you will define the...  ...Alibaba Cloud) at massive scale. Acting as a principal technical...  ...or San Francisco. Responsibilities: Build the foundations of...  ...network issues, running an incident through to completion and... 
    Flexible hours
    Weekend work

    Airwallex-

    San Francisco, CA
    4 days ago
  • B Capital is looking for a Production Support Engineer in San Francisco. You'll play a key role in ensuring the reliability of the Agentforce Supply Chain platform and work with an agile team on scaling the product and automating infrastructure. The ideal candidate has... 

    B Capital

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Network Reliability Engineer - Scale & Incident Response. Be the first to apply!