Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Network Engineer, Reliability & Observability

$150k - $250k

Fluidstack

Network Engineer, Reliability & Observability

Fluidstack is seeking a Network Engineer, Reliability & Observability to serve as a reliability engineer championing and building process, data collections, and reliability metrics with the objective of improving the quality and reliability of AI networks from deployment through the full lifecycle of operations.

This role is focused on developing processes, systems, tools, data and data pipelines, and observability to improve the quality of networks and deliver automated metrics (24x7) as well as periodic reliability reports for both internal and external customers.

This role is ideal for experienced network operators who are passionate about reliability and have experience designing and building full lifecycle software such as Quality Assurance audits, circuit audits, periodic audits, failure rates and failure analysis. You are passionate about hardware (electronics and optics), software development, and you value and promote the use of data to make informed decisions in deployment, operations, and strategic sourcing.

Experienced SRE (Site Reliability Engineers) with a passion for networking are encouraged to apply.

Focus
  • Ownership of Quality Assurance: Design, develop, and support QA process for network hardware and networks.

  • Pipelines: Develop and deploy serverless workflows, server based, and manually triggered data pipelines producing network quality and reliability observability for internal and external customers.

  • Deployment and Operations Support: Support full lifecycle data collection and analysis partnering with Deployment, Operations, DC hardware, and logistics teams to produce data that drives process improvements and delivers on SLA and SLOs.

  • Process Engineering: Develop, pilot, and deploy process improvements for deployment and repair to produce data and consume data with Machine Learning to fulfill our mission.

  • Cross-Team Collaboration: Own without ego and execute in a collaborative team with design, deployment, operations engineers and software developers.

  • Subject Matter Expert: In at least two or more deep subjects such as IP routing, optics, optical transport, Ethernet, RDMA/RoCE, or electrical power.

About You
  • Strong Operations Background: 5+ years in network engineering and at least 3+ years in operations with significant hands-on operational experience. You've run production networks or compute, responded to incidents at all hours, and debugged complex failures under pressure. You understand the difference between "working" and "production-ready".

  • Software Development: You have experience with ITIL, Agile (xP), and TDD including developing and leading programs and projects. You have experience building hyperscale platforms, demonstrating a fluency in Golang with supporting tools in Python or RUST.

  • Datacenter Fabric Expertise: Deep experience operating modern datacenter networks including EVPN/VXLAN, BGP, CLOS topologies, and high-radix switching. You're comfortable troubleshooting Layer 2/3 issues, BGP routing problems, fabric misconfigurations, and physical media failures.

  • Incident Response Excellence: Proven ability to lead incident response, perform systematic troubleshooting, and drive issues to resolution. You remain calm during outages, communicate clearly with stakeholders, and know when to escalate versus when to dig deeper. You've been the person others call when things break.

  • Matrix Leadership Experience: You understand how to build relationships with onsite teams, coordinate physical infrastructure work, and represent network engineering in a field environment. You know how to get things done in operational settings with many internal and external teams and stakeholders.

  • Operational Pragmatism: You balance perfection with progress. You can troubleshoot with imperfect information, make pragmatic decisions under time pressure, and prioritize based on business impact. You document as you go and continuously improve operational processes.

  • Self Driven: You embrace complex challenges with undefined process and key results. You can dive in to learn, but zoom back out to build Objectives, develop Key Results and build a software development project and pipeline in Jira solo. You can then switch hats and begin coding.

  • Travel: You are willing and able to travel to spend time with the team at our local offices or data center locations, up to 20% of the time.

Nice to Haves
  • AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion.

  • Reliability Engineering: You have experience with observability and reliability engineering from network operations or in manufacturing quality.

  • Hardware Repair Experience: Hands-on experience coordinating hardware repairs, RMAs, and physical infrastructure work. You understand datacenter logistics, vendor escalation processes, and how to work effectively with onsite technicians.

  • Observability & Monitoring: Familiarity with network monitoring platforms, alerting systems, and telemetry collection. You've used monitoring tools to diagnose issues proactively and tune alerting to reduce noise. You have experience with SQL, MySQL, and building operations dashboards.

Salary & Benefits
  • Competitive total compensation package (salary + equity).

  • Retirement or pension plan, in line with local norms.

  • Health, dental, and vision insurance.

  • Generous PTO policy, in line with local norms.

The base salary range for this position is $150,000 - $250,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.

We are committed to pay equity and transparency.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email, please email View email address on click.appcast.io with your resume/CV, the role you've applied for, and the date you submitted your application-- someone from our recruiting team will be in touch.

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Network Engineer, Reliability & Observability in Austin, TX vacancy
  • $184.12k - $275.45k

    A leading automotive manufacturer in Austin, Texas is hiring a Staff Engineer for the Hybrid Services & Reliability team. The role is crucial for the reliability of the cloud environment, impacting hardware-in-the-loop systems for safety validation. Candidates should have... 
    Suggested

    General Motors

    Austin, TX
    1 day ago
  •  ...connection to their devices. Be part of an outstanding team of engineers, working with multi-functional teams to drive innovation and...  ...products used by millions worldwide. As a Wireless Network Reliability Engineer, you will be responsible for driving innovation and... 
    Suggested
    Worldwide

    Apple

    Austin, TX
    2 days ago
  •  ...technology company based in Austin, Texas is seeking a Site Reliability Engineer to ensure the reliability and performance of its FMD...  ...Responsibilities include collaborating with developers, implementing observability practices, and troubleshooting incidents. Applicants... 
    Suggested

    Apple Inc.

    Austin, TX
    4 days ago
  • Site Reliability Engineer (Edge Services), Infrastructure Services Austin, Texas, United States...  ...services are resilient, scalable, and observable, bridging the gap between complex...  ...Understanding of Linux internals and deep networking expertise, including (QUIC), and... 
    Suggested
    Shift work

    Apple Inc.

    Austin, TX
    3 days ago
  • $184k - $287.5k

    Senior System Software Engineer - Data Platform Observability page is loaded## Senior System Software Engineer - Data Platform Observabilitylocations...  ..., debug distributed pipelines, and ensure platform reliability.****What you’ll be doing:***** Architect High-Performance... 
    Suggested

    NVIDIA Corporation

    Austin, TX
    13 hours ago
  • $79.1k - $158.2k

     ...Oracle's Health Data Intelligence (HDI) team as a Software Engineer 3, focused on Site Reliability Engineering for large-scale healthcare analytics...  ...evolution of cloud operations by advancing automation, observability, and AI-assisted reliability practices. This includes... 
    Temporary work
    Flexible hours

    Oracle

    Austin, TX
    13 hours ago
  • Sr. Software Engineer - Site Reliability About ShipperHQ: ShipperHQ is a trusted leader in the e-commerce shipping space, with over 15 years of...  ...highly available systems in AWS Build and maintain observability, monitoring, and logging systems Support software engineers... 
    Full time
    Work at office

    Zowta, LLC

    Austin, TX
    2 days ago
  • Site Reliability Engineer, Enterprise Technology Services Austin, Texas, United States Software...  ...Grafana, CloudWatch) Proficient in Linux, Networking concepts (TLS/SSL, DNS, Load...  ...Understanding of SRE principles, including observability, error budgeting, service reliability... 

    Apple Inc.

    Austin, TX
    4 days ago
  • Site Reliability Engineer, Enterprise Technology Services Austin, Texas, United States Software...  ...management, and release processes. Implement observability practices: transaction tracing,...  ...Qualifications Proficient in Linux, Networking concepts (TLS/SSL, DNS, Load... 
    Work experience placement

    Apple Inc.

    Austin, TX
    4 days ago
  • Title: Site Reliability Engineer Agency: Texas Health and Human Services Commission Solicitation: 529601671 Location: 4601 W. Guadalupe...  ...operational workflows 4 years, Preferred - Familiarity with observability tools (Prometheus, Grafana, Application Insights, Datadog,... 
    H1b
    Local area
    Remote work

    Pedigo Staffing Services

    Austin, TX
    2 days ago
  •  ...their money by providing innovative and reliable technology products and services as a...  ...investing and financial planning. This engineering role supports the business by الأرد accelerating...  ...patterns, CI/CD pipelines, and observability practices. What’s in it for you At... 

    Charles Schwab

    Austin, TX
    4 days ago
  • Site Reliability Engineer, Teamcenter, Enterprise Technology Services Austin, Texas, United States...  ...practices. Responsibilities System Observability: Implement and maintain robust...  ...experience). Good command on Linux, networking concepts (TLS/SSL, DNS, Load Balancers... 

    Apple Inc.

    Austin, TX
    2 days ago
  • $98.58k - $138.02k

     ...locations: Austin, TX; Irvine, CA; or Akron, OH. Role Site Reliability Engineer II will be responsible for supporting, enhancing, and...  ...Enhance and evolve monitoring tools and platforms to improve observability. Promote and apply best practices for reliability, scalability... 
    Work at office

    Restaurant365

    Austin, TX
    13 hours ago
  •  ...organizations work. We are currently looking for a Senior Site Reliability Engineer to join our SRE team in the Platform Engineering...  ...to millions of end-users. The role focuses on automation, observability, and ensuring the quality and availability of our services... 
    Permanent employment
    Remote work
    Work from home
    Flexible hours

    NinjaOne

    Austin, TX
    1 day ago
  •  ...re looking for a Senior SRE to own the reliability, scalability, and operational posture...  ...Build and maintain CI/CD pipelines, observability stacks, and incident response workflows...  ...development workflows Partner closely with engineering on reliability reviews and... 

    Satsuma

    Austin, TX
    2 days ago
  •  ...are constantly striving to make the most reliable and scalable systems possible to ensure...  ...for a passionate Site Reliability Engineer to join our team in Dallas, TX or Austin...  ...operations, reduce toil, and improve system observability. Defining and driving the adoption of... 
    Local area

    Traveltechessentialist

    Austin, TX
    1 day ago
  • $152k - $241.5k

    Senior Site Reliability Engineer - HPC page is loaded## Senior Site Reliability Engineer - HPClocations...  ...with HPC schedulers, storage, and network fabrics.* Use IaC(Infrastructure‑as‑...  ...fleet reliability/auto-healing, E2E observability or data-driven operations (AIOps/ML-... 

    NVIDIA Corporation

    Austin, TX
    13 hours ago
  •  ...come in. We're seeking a Senior Site Reliability Engineer who can own our data tier at high...  ...contribute to the broader platform, observability with Prometheus, Loki, and Tempo; on‑...  ...plan, indexes, locking, OS, storage, network — and can point to specific incidents... 
    Permanent employment
    Local area
    Flexible hours

    Zello

    Austin, TX
    1 day ago
  • $185k - $225k

    We are looking for an experienced engineer with strong Linux and system-level expertise who can operate autonomously...  ...to overall system stability, performance, and observability. We are looking for a hands‑on Site Reliability Engineer (SRE) with a strong background in Linux... 
    Work at office

    Bumble Inc.

    Austin, TX
    3 days ago
  •  ...a force multiplier for our engineering organization. Our mission is...  ...of our architecture, reliability, and developer enablement strategy...  ...(EKS preferred), including networking, custom controllers, and...  ..., not just consumed them. Observability Architect: You have designed... 
    Temporary work
    Immediate start
    Flexible hours

    FloSports, Inc.

    Austin, TX
    2 days ago
  •  ...developers or autonomous agents is reliable, secure, and maintainable....  ...you will: Be a pivotal engineering contributor to the design,...  ...operation of Sonar's global network infrastructure spanning all...  ...monitoring, alerting, and observability capabilities that give us real... 
    Work at office

    SONAR

    Austin, TX
    4 days ago
  •  ...career. THE ROLE: Join a global product reliability team that drives silicon and package...  ...a highly visible role within the AMD engineering team and is responsible for defining and...  ...debug and analysis of reliability observations, drive root cause and corrective actions... 

    Advanced Micro Devices

    Austin, TX
    4 days ago
  • Tactical Communications & Networking Engineer - NODA AI Location: Austin, TX (Hybrid on-site,...  ...into unmanned platforms, and ensure reliable connectivity in constrained, intermittent...  ...autonomy software systems. Build observability into the networking stack (telemetry,... 
    Flexible hours

    NODA AI

    Austin, TX
    1 day ago
  •  ...technology company is looking for a Software Engineer (IC3) to design and build modern, cloud...  ...and involves ensuring scalability and reliability in cloud-based environments....  ...include developing features, implementing observability tools, and collaborating with cross-functional... 

    City of Shakopee, MN

    Austin, TX
    4 days ago
  •  ...Bangalore, and Tokyo. Department: Engineering - Video Location: Austin...  ...is key to the level of Network Ownership and accelerated cross...  ...to Kubernetes and site reliability engineering (SRE). If you are...  ...Utilize and support network observability tools (Prometheus, Grafana)... 
    Work at office
    Immediate start
    Flexible hours

    Eagle Eye Networks

    Austin, TX
    3 days ago
  • Join to apply for the Network Engineer role at Base Power Company Join to apply for the Network...  ...power grid and enable affordable and reliable electricity for all. To do that, we...  ...automate comprehensive network monitoring, observability, and alerting systems (e.g. Grafana)... 
    Full time
    Work at office
    Local area
    Remote work

    Base Power Company

    Austin, TX
    3 days ago
  • $170k - $215k

    Role Overview As a Senior AV Network Engineer in the AV Core Infrastructure (ACI) organization, you are an architect and site reliability engineer for the secure hybrid network fabric...  ...-value architectural improvements. Observability: Build and maintain advanced... 
    Flexible hours

    General Motors

    Austin, TX
    1 day ago
  •  ...pragmatism. We’re looking for a Senior Network Engineer to join our Platform Operations and...  ...organization evolve toward more automated, reliable operationsOur environment today is...  ...practicesOperational Excellence:* Improve monitoring, observability, and network health visibility* Build... 
    Casual work
    Work at office
    Remote work
    Monday to Friday

    Dimensional Fund Advisors

    Austin, TX
    3 days ago
  • $98.58k - $138.02k

    A leading SaaS company is seeking a Site Reliability Engineer II to maintain and enhance its cloud infrastructure. This role involves incident response, system monitoring, and automation, requiring 2-4 years of experience in site reliability or DevOps. The company offers... 

    Restaurant365

    Austin, TX
    13 hours ago
  •  ...founded by two former Navy electrical engineers with a proven track record in robotics...  ...The Role We are looking for a Systems Reliability Engineer to join our team, focusing on...  ...engineers to improve design robustness, observability, maintainability, and failure tolerance... 
    Full time
    Local area

    Allen Control Systems

    Austin, TX
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Network Engineer, Reliability & Observability. Be the first to apply!