Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Network Reliability & Observability Engineer

Fluidstack

A leading infrastructure company is seeking a Network Engineer, Reliability & Observability to enhance AI network reliability. This role involves developing QA processes, serverless workflows, and collaborating with cross-functional teams. Ideal candidates have over 5 years in networking, strong incident response skills, and experience with data center networks. A passion for hardware, software development expertise, and strong problem-solving abilities are essential. This position is based in San Francisco, California and supports a culture of excellence. #J-18808-Ljbffr

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Network Reliability & Observability Engineer in San Francisco, CA vacancy
  •  ...home day is currently Tuesday. Engineering at Lambda is responsible for...  ...’ll Do Deploy and operate observability platforms for logging,...  ...adoptable and improve product reliability. Lead members of other engineering...  ...monitoring or network monitoring Experience with Prometheus... 
    Network
    Work at office
    Local area
    Work from home

    Lambda

    San Francisco, CA
    4 days ago
  • $147k - $202k

     ...Overview: We are seeking a highly technical Staff Observability Site Reliability Engineer with a specialty in Splunk to own and evolve our...  ...Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container... 
    Network
    Permanent employment
    Work at office
    Local area
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    a month ago
  • We’re looking for a Systems Reliability Engineer to own the reliability of our system across cloud...  ...is responsible for making systems observable, diagnosable, and repeatable as we scale...  ...issues across infrastructure, networking, and distributed systems Partner with... 
    Network
    Permanent employment

    Claryo

    San Francisco, CA
    5 days ago
  •  ...re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems...  ...you’ll do Reliability, Observability and Performance: Maintain...  ...resource usage across compute, networking and storage. Security, Compliance... 
    Network
    Work at office
    Remote work
    Flexible hours
    2 days per week

    Plenful

    San Francisco, CA
    10 hours ago
  • $293k - $385k

     ...Team The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and operating...  ...IT, Security, Identity, and Network teams to ensure infrastructure...  ...Ensure automation is safe, observable, and resilient under failure conditions... 
    Network
    Work at office

    OpenAI

    San Francisco, CA
    3 days ago
  • $150k - $250k

    Founding Security Reliability Engineer Location: San Francisco - In office. Employment: Full-time...  ...(primarily AWS), including network security, identity and access management...  ...secrets management solutions. Security Observability & Monitoring: Establish comprehensive... 
    Network
    Full time
    Work at office

    Madrona Venture Labs

    San Francisco, CA
    3 days ago
  •  ...About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace...  ...capacity across our distributed GPU network, and implementing secure rollout and...  ...automated rollback mechanisms Proficient in observability tools and practices including metrics... 
    Network

    deCircle

    San Francisco, CA
    4 days ago
  •  ...information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site...  ...teams to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable...  ...experienceAdvanced knowledge of Linux, Networking, and ContainersProficiency in at least... 
    Network
    Immediate start
    Remote work
    Worldwide

    OutSystems

    San Francisco, CA
    4 days ago
  •  ...Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time...  ...been quietly building the systems, network, and orchestration layer that makes the...  ...implement monitoring, alerting, and observability for critical systems. Collaborate... 
    Network
    Full time
    Remote work

    Andromeda Cluster

    San Francisco, CA
    3 days ago
  •  ...onboard services and teams to the reliability tenets. Establish and...  ...development teams to build resilient, observable, fault‑tolerant, recoverable...  ...in Site Reliability Engineering, managing infrastructure and...  ...knowledge of Linux, networking, and containers. Proficiency... 
    Network

    OutSystems, Inc.

    San Francisco, CA
    2 days ago
  •  ...We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance...  ...our reliability posture end-to-end—observability, performance tuning, incident ops, infrastructure...  ...and scale ops. Work across compute, networking, storage, and sandboxed execution... 
    Network

    Blaxel

    San Francisco, CA
    3 days ago
  • $140k - $220k

    About the Job You’ll own reliability and operational excellence for...  ...that makes the entire engineering team more effective, establish...  ...Deep AWS expertise (ECS, RDS, networking, security) Strong...  ...monitoring, alerting, and observability systems from first principles... 
    Network

    Pylon

    San Francisco, CA
    5 days ago
  • We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure...  ...validator nodes for multiple blockchain networks. You’ll also provide guidance and...  ...experiences. They enjoy building testing and observability capabilities that will accelerate the... 
    Network
    Remote job

    Blockchain Works

    San Francisco, CA
    9 days ago
  • $125k - $195k

     ...team of exceptional, hands-on engineers to make this happen....  ...seeking an Infrastructure & Site Reliability Engineer to design, build,...  ...Design and setup low level networking components, e.g., service...  ...compatible storage, VPNs Scale our observability platform: Build systems to... 
    Network
    Work at office
    Visa sponsorship
    Night shift

    Atomicsemi

    San Francisco, CA
    4 days ago
  • $138k - $179k

     ...proactively support using improved automation, observability and tooling. We are responsible for...  ...of other teams from infrastructure and engineering, to QA and business teams, so strong...  ...for achieving results. A global network of talented colleagues, who inspire, support... 
    Network
    Flexible hours

    MSCI

    San Francisco, CA
    3 days ago
  •  ...daily users while enabling our engineering teams to ship fast. You'll...  ...and tooling that improves reliability and partnering with engineering...  ...to design systems that are observable, resilient, and easy to...  ...infrastructure including compute, networking, databases, and managed... 
    Network
    Work at office
    Work from home

    gamma.app

    San Francisco, CA
    5 days ago
  •  ...significantly outperforms individual engineers. We combine language models...  ...seeking an experienced Site Reliability Engineer to join our...  ...comprehensive monitoring, alerting, and observability solutions using Datadog and...  ...reporting Design secure network architectures including VPC... 
    Network

    CodeRabbit

    San Francisco, CA
    2 days ago
  • $150k - $170k

    Claryo, Inc. is seeking an Integration Reliability Engineer in San Francisco, CA, responsible for...  ...candidate will build and maintain observability tools and improve incident response processes...  ...experience in SRE, strong Linux and networking skills, and familiarity with... 
    Network

    Claryo, Inc.

    San Francisco, CA
    4 days ago
  • $127k - $249k

    The Team Platform Engineering is the department within SRE that is...  ...Kubernetes infrastructure, networking, load balancing (including...  ...internal service mesh), and observability and alerting systems. The...  ...components that ensure cluster reliability and security (e.g., CoreDNS... 
    Network
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    4 days ago
  • $250k

     ...platform spanning infrastructure, networking, and orchestration....  ...Kubernetes environments. Develop observability, alerting, and auto-healing...  ...code, CI/CD pipelines, and reliability standards across thousands...  ...DevOps, or Infrastructure Engineering roles supporting large-... 
    Network
    Immediate start

    Hamilton Barnes Associates Limited

    San Francisco, CA
    5 days ago
  • CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture...  ...system reliability, efficiency, and observability. Outline and monitor SLIs, SLOs, and...  ...debugging expertise throughout providers, networking, and knowledge layers. Arms-on... 
    Network

    The10minutecareersolution

    San Francisco, CA
    4 days ago
  •  ...and Azure. Building reusable Terraform components (networking, IAM, secrets). Wiring up observability and tightening the loop between infra change and production...  ...customers. We're looking for an infrastructure engineer who actually wants to live in Terraform and... 
    Network
    Live in
    Work from home

    E2B

    San Francisco, CA
    3 days ago
  • $227.2k - $324.5k

     ...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations...  ...technical strategy and vision for Tubi's observability, and automation platforms. Partner...  ...of AWS services (especially networking, IAM, EKS, ALBs/NLBs, Route 53, CloudWatch... 
    Network
    Full time
    Contract work
    Temporary work
    Local area
    Flexible hours

    Tubi

    San Francisco, CA
    1 day ago
  • $238k - $290k

     ...Role Overview As a Staff Software Engineer on the Site Reliability team at Harvey, you will ensure the...  ...infrastructure resources (compute, storage, networking) across 50+ global regions Lead...  ..., etc.) Deep familiarity with observability tools (Datadog, Sentry, etc.) and... 
    Network
    Relocation package

    HARVEY

    San Francisco, CA
    4 days ago
  •  ...a Sr. Staff Infrastructure Engineer at Twelve Labs, you will combine...  .... Own key tradeoffs across reliability, cost, and velocity, making...  ...Strong fundamentals in OS, networking, storage, and compute....  ...Infrastructure as Code, CI/CD, and observability (e.g., Terraform, GitHub... 
    Network
    H1b
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    Twelve-Labs

    San Francisco, CA
    5 days ago
  • $151k - $297k

    The Team Platform Engineering is the department within SRE that is...  ...Kubernetes infrastructure, networking, load balancing (including...  ...internal service mesh), and observability and alerting systems. The...  ...components that ensure cluster reliability and security (e.g., CoreDNS... 
    Network
    Local area
    Immediate start
    Remote work
    Flexible hours
    Shift work

    MongoDB

    San Francisco, CA
    1 day ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco •...  ...have been quietly building the systems, network, and orchestration layer that makes...  ...that degrade collective operations. Observability: Build deep visibility into GPU utilization... 
    Network
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    2 days ago
  • $300 per month

     ...On-site Department Cloud Engineering Crusoe's mission is to accelerate...  ...Role As a Principal Site Reliability Engineer, you will play a...  ...Architect and improve observability systems (metrics, logs, tracing...  ...with Infrastructure, Networking, Hardware, and Platform teams... 
    Network
    Full time
    Temporary work

    Epoch Biodesign

    San Francisco, CA
    2 days ago
  • $181k - $263k

    ## Senior Staff Site Reliability EngineerApplylocations: San Franciscotime...  ...across its premier global network of top-quality partners.****...  ...Staff Site Reliability Engineer who will set the technical direction...  ...across teams* Expertise in observability engineering—SLOs, SLI... 
    Network
    Work from home
    Flexible hours
    Night shift

    LiveRamp

    San Francisco, CA
    2 days ago
  • $210k - $310k

     ...tremendous growth of the Stellar blockchain network, an open-source platform that operates...  ...SDF is looking for a Director of Site Reliability Engineering to lead a small, high-leverage SRE...  ...SDF engineering teams build, deploy, observe, and operate software with confidence.... 
    Network
    Temporary work
    Work at office
    Local area
    Worldwide
    Flexible hours

    Stellar

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Network Reliability & Observability Engineer. Be the first to apply!