Staff Network Reliability Engineer - Scale & Incident Response
$195k - $235kCrusoe Energy Systems LLC
Crusoe Energy Systems LLC is looking for a Staff Network Operations Engineer to ensure production reliability across its global network infrastructure. This role is critical in maintaining uptime and facilitating AI workloads via incident response and operational excellence. The ideal candidate has 8+ years of experience in network engineering, specializing in operations and incident response. You'll work with advanced monitoring tools and help shape the future of AI infrastructure. Compensation ranges from $195,000 to $235,000, plus bonuses and stock options. #J-18808-Ljbffr Crusoe Energy Systems LLC
$200k - $240k
A leading AI startup in San Francisco is seeking a Staff Software Engineer to help define the future of incident response by creating an autonomous AI SRE. You will design complex data flows, drive product direction, and maintain high engineering standards across the stack...Suggested$225k - $275k
...Francisco is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network. In this role, you will lead incident response and define key operational... ...track records in reliability at scale. The position offers competitive...Suggested$150k - $250k
...hardware and software. Speed and scale are our key... ...Role Fluidstack is seeking a Network Engineer, Reliability & Observability to serve as... ...or compute, responded to incidents at all hours, and debugged... ...media failures. Incident Response Excellence: Proven ability...SuggestedLocal area$175k - $250k
...Regular Toilet is seeking a Site Reliability Engineer to enhance the reliability and performance... ...team, you will handle critical responsibilities like improving incident responses and collaborating with... ...our platform runs reliably at scale. #J-18808-Ljbffr I did my part...SuggestedRemote jobFlexible hours- ...to perform under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the... ...design and operate the global network and reliability layer behind... ...monitoring, alerting, and incident response — SLOs, runbooks, and on-call...SuggestedFull time
$250k - $350k
...spanning hardware and software. Speed and scale are our key differentiators. Come be a... ...anywhere. We are building Detection & Response Engineering from the ground up: engineering-led,... ...IT, OT, and physical surfaces. As the Staff Incident Responder, you are the most senior...Contract workLocal area- Epoch Biodesign is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network in San Francisco. This role drives incident response and sets operational standards for Crusoe's extensive AI infrastructure, requiring strong...
$200k - $250k
...infrastructure to ensure the platform is reliable, fast, and resilient as we scale. Role Mission Own service reliability end-to-end: prevent incidents, reduce blast radius when failures... ...command quality: Lead Sev1/Sev2 response end-to-end (containment, communications...Permanent employment$157.7k - $277.8k
...Location Type Hybrid Department Engineering, product & design... ...available, performant, and reliable, 24/7. As an Infrastructure... ...complex distributed systems Lead incident response, post-mortems, and root cause... ...building and operating large-scale, high-availability production...Full timeWork at officeLocal areaFlexible hours- Overview Senior Platform & Reliability Engineer OpenArt is an AI Storytelling and Visual Creation... ...systems, notslices. Ship at real scale, your work goes to millions of users,... ...Participate in an on-call rotation and improve incident response (alert quality, run books, escalation...Remote workWorldwideVisa sponsorship
- ...experience in Site Reliability Engineering, DevOps, or a... ...focused on large-scale production systems... ...operational automation, incident management, and... ...compute, networking, storage, and database... ...of Technical Staff, Cluster Management... ...Incident Management & Response: Lead efforts in...
$200k - $250k
...This hands-on technical leadership role demands expertise in service reliability to ensure the platform's performance as it scales. Responsibilities include setting reliability standards, managing incident responses, and driving architectural resilience using Kubernetes...- Founding Platform & Reliability Engineer About OpenArt OpenArt is an AI Storytelling and Visual... ...real systems, not slices. Ship at real scale, your work goes to millions of users,... ...in an on-call rotation and lead incident response improvements (alert quality, runbooks...Remote workWorldwideVisa sponsorship
- About the Team The Scaling team designs, builds, and operates critical... ...workloads, while remaining reliable and easy to use. About the... ...Site Reliability Engineer to own production-critical infrastructure... ..., and continuously improve incident response standards, on‑call practices...
$202.8k - $327.63k
...Director, SRE Platform Engineering is a senior engineering leader responsible for bringing production... ...Management (ITSM) and Site Reliability Engineering (SRE)... ...global workforce Evolve incident response into a highly... ...Developer Platforms (IDP) at scale Background in building...Permanent employmentContract workWork at officeLocal areaRemote work2 days per week- A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...
- ...dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production... ...You will manage production systems' reliability and lead incident response efforts to prevent issues, all while contributing to the...
- Epoch Biodesign in San Francisco is looking for a Staff Network Operations Engineer to enhance the reliability of their global network infrastructure. This role... ...a seasoned network engineer to handle production incidents, maintain high system availability, and optimize...
- A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations...
- ...looking for a Systems Reliability Engineer to own the... ...warehouses. This role is responsible for making systems observable... ...and repeatable as we scale across deployments.... ...Define and improve incident response, severity levels... ...infrastructure, networking, and distributed systems...Permanent employment
$150k - $250k
...As our Founding Security Reliability Engineer at Charta Health, you'll pioneer... ...opportunity to build and scale the foundational security... ...mitigation, and efficient incident response. You'll be crucial in engineering... ...(primarily AWS), including network security, identity and...$293k - $385k
...The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and... ...operational leverage as OpenAI scales. About the Role... ..., Identity, and Network teams to ensure infrastructure... ..., alerting, and incident response mechanisms to...Work at office- ...of AI infrastructure: large-scale AI datacenters and the... ...Gimlet Labs is seeking a Network Engineer to design, build, and scale... ...operations teams to improve network reliability, deployment velocity,... ...deployment validation, and incident response workflows. You may be a...
- A technology solutions provider is looking for a Network Engineer to enhance and maintain a large-scale network. This role involves managing both wired and wireless infrastructures, conducting assessments, and ensuring network security. Candidates should have a degree...
$175k - $215k
...Software Reliability Engineer, Waymo Fleet Waymo is an autonomous driving... ...Engineers (SRE) are responsible for the stable operation of... ...techniques to build and run large-scale, fault-tolerant, reliable... ...infrastructure by leading incident response efforts. You'll participate...Full timeRemote work$195k - $235k
...urgency, who believe in the scale of our ambition and thrive on... ...: Crusoe Cloud is seeking a Staff Network Operations Engineer to help own production reliability across our global network infrastructure... ...ownership role focused on incident response, root cause analysis, and...Temporary workWorldwide$225k - $275k
...Senior Staff Network Operations Engineer Crusoe Cloud is seeking a Senior Staff... ...Engineer to own production reliability across our global network... ...interconnects. You will drive incident response, root cause analysis, and... ...healthy at scale. This is a senior production...- ...finance at a global scale. Proudly... ...support, IT engineering and business... ...access management. Network Operations... ...our Airwallex staff all over the globe... ...and ensures a reliable and stable... ...Francisco. Responsibilities: Build... ...issues, running an incident through to...Work at officeRemote workWorldwideFlexible hoursWeekend work
$195k - $235k
...Staff Network Operations Engineer Crusoe Cloud is seeking a Staff Network Operations... ...to help own production reliability across our global network... ...role focused on incident response, root cause analysis, and... ...infrastructure running at scale. Your work will directly...Temporary workWorldwide$150k - $170k
...Inc. is seeking an Integration Reliability Engineer in San Francisco, CA, responsible for ensuring the reliability of... ...observability tools and improve incident response processes. Qualifications... ...experience in SRE, strong Linux and networking skills, and familiarity with...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Network Reliability Engineer - Scale & Incident Response. Be the first to apply!
- software engineer staff San Francisco, CA
- assistant engineer San Francisco, CA
- assistant engineering manager San Francisco, CA
- staff design engineer San Francisco, CA
- project engineer assistant project manager San Francisco, CA
- technology administrator San Francisco, CA
- staff data engineer San Francisco, CA
- assistant chief engineer San Francisco, CA
- senior staff systems engineer San Francisco, CA
- staff engineer San Francisco, CA

