Senior Network & Site Reliability Engineer
Alembic, Inc.
About Us Alembic is the pioneering Causal AI platform. We help the world's largest enterprises move past correlation to prove what actually drives business outcomes — the question marketing and growth teams have never been able to answer with confidence. Fortune 100 companies including Nvidia, Delta Air Lines, and Mars use Alembic to make multimillion-dollar decisions on trusted, causal evidence. We're backed by a $145M Series B from WndrCo (founded by Jeffrey Katzenberg), Jensen Huang, Joe Montana, Prysm Capital, and Accenture. Our models run on our own NVIDIA DGX SuperPOD built on Grace Blackwell infrastructure — one of the fastest private supercomputers in the world. (We've melted GPUs getting here.) About the Role We're building infrastructure that has to perform under real-world scale, reliability, and security demands — and we're looking for an engineer who wants to own the foundation it runs on. This isn't a traditional "keep the lights on" role. You will design and operate the global network and reliability layer behind one of the world's fastest private supercomputers — the fabric powering distributed compute, ML workloads, real-time analytics, and mission-critical enterprise systems. You'll work across networking, systems, automation, observability, and reliability engineering to scale a platform where performance genuinely matters, with real influence over architecture decisions. It's a strong fit if you like solving deep infrastructure problems, building resilient systems, automating everything repetitive, and owning architecture rather than just maintaining it. What You'll Do Architect and operate scalable, secure network architecture for high-security requirements and large‑scale machine learning workloads. Own network device configuration management end to end, ensuring consistency and reliability across the fleet. Improve system and network reliability and performance through automation, observability, and proactive capacity planning. Implement and manage complex network protocols and connectivity, including BGP, VPNs, and WAN circuits and external peering. Build and maintain comprehensive monitoring, alerting, and incident response — SLOs, runbooks, and on-call rotations — and drive post‑incident analysis and continuous improvement. Ensure security, compliance, and operational readiness across our network and cloud infrastructure. Partner across engineering and data science to drive a culture of performance and reliability. What Will Help You Succeed 8+ years in network or infrastructure engineering, including 5+ years in datacenter operations and/or systems and network administration. A strong background in network security, architecture, design, and operations. Extensive hands‑on experience with network devices (firewalls, switches, load balancers) and large-scale architectures and protocols — BGP, QoS, MPLS, and IPsec VPNs. Experience designing and operating modern datacenter network fabrics (spine‑leaf, EVPN/VXLAN, ECMP). Network automation and IaC tooling (Ansible, Terraform, Nornir, or similar), plus IPAM/DCIM platforms (NetBox, Infoblox, or similar). WAN engineering — carrier circuit provisioning and external network peering. Familiarity with Kubernetes networking (CNI plugins, ingress, service networking, network policy) and strong operational experience with Linux-based production infrastructure. Experience with monitoring and observability stacks (Prometheus, Grafana, Datadog, ELK, OpenTelemetry). Solid scripting (Python, Bash) to debug complex network and system issues and automate solutions, plus excellent cross‑functional communication. Also Helpful NVIDIA networking technologies — Cumulus Linux, InfiniBand, Spectrum‑X, and BlueField DPUs (this is the fabric behind our SuperPOD). Familiarity with data‑intensive platforms (Spark, Airflow, Kafka) and storage network protocols (NFS, LustreFS, iSCSI). Security practices for applications and infrastructure, and experience in high‑compliance or SOC 2 environments. The Role Is Right for You If You want to own mission-critical network and infrastructure end to end — from architecture to incident management — not just keep it running. You’d rather build and automate than direct from a distance, and you want meaningful influence over how a high‑performance platform scales. Why You Might Be Excited About Alembic Hard problems with real impact : You'll own the network and reliability layer behind systems that influence multimillion‑dollar decisions at Fortune 100 companies. Cutting‑edge technology : Operate our own NVIDIA DGX SuperPOD on Grace Blackwell — one of the fastest private supercomputers in the world — and run a fabric (InfiniBand, Spectrum‑X, BlueField) almost no company has in‑house. Technical autonomy : Ownership over architecture decisions and the freedom to solve hard infrastructure problems your way. Elite team : Join top engineers who thrive on hard problems and high‑impact work. Series B momentum, real ownership : Meaningful equity at a Series B company that's raised $145M, with proven product‑market fit and Fortune 100 traction. Why You Might Not Be Excited If you only want to tell people what to build instead of building and automating alongside them, this isn't the environment for you. You prefer companies with 100% built‑out process for every detail. You prefer static over dynamic — projects and priorities adapt as we grow. We have real paying customers and a playbook, and we still move at startup speed at Series B scale. #J-18808-Ljbffr Alembic, Inc.
$227.2k - $324.5k
...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team.... ...seeking an experienced and visionary Senior SRE Manager to lead and grow our newly... ...knowledge of AWS services (especially networking, IAM, EKS, ALBs/NLBs, Route 53, CloudWatch...SeniorNetworkFull timeContract workTemporary workLocal areaFlexible hours- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded... .... Since then, we have been quietly building the systems, network, and orchestration layer that makes the world’s AI...SeniorNetworkFull timeRemote work
$232k - $319k
...scale the service with great people and reliable, cost-effective, and efficient... ...oversee multiple teams focused on Edge networking, K8s platform, Observability, automation... ...Accelerate the velocity of SRE and product engineering by developing robust platforms, powerful...SeniorNetworkPermanent employmentLocal areaWorldwideFlexible hours$300k
...full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation... ...-availability GPU workloads. Collaborate with ML, networking, and platform teams to optimise resource scheduling, GPU...SeniorNetworkPermanent employment$250k
...in the United States. The company is looking for a Senior / Staff Site Reliability Engineer to support and scale large-scale HPC and cloud environments... ...modern GPU cloud providers Strong understanding of networking fundamentals (DNS, TCP/IP, routing, performance...SeniorNetworkPermanent employmentRemote work- ...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from... ...redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data...Senior
- ...daily users while enabling our engineering teams to ship fast. You'll... ...and tooling that improves reliability and partnering with engineering... ...including compute, networking, databases, and managed services... ...you'll bring ~5+ years in Site Reliability Engineering, DevOps...NetworkWork at officeWork from home
- ...About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product... ...and predictable resource usage across compute, networking and storage. Security, Compliance and...NetworkWork at officeRemote workFlexible hours2 days per week
$261k - $326k
...specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over... ...operational excellence. Candidates should have strong networking expertise and systems fundamentals, especially in high-scale...SeniorNetwork- ...fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening... ...and resolving issues related to memory management, networking, and system reliability Ability to work directly with customers...Network
- ...Job Description Forhyre is looking for engineers who can bring unique perspectives and... ...practices while building a culture of reliability and observability Engage in and improve... ..., preferably Kubernetes and networking technology Hands-on experience in one...Network
- ~ Senior Software Engineer (Rust) at Symbolica – San Francisco, US Senior Software Engineer (Rust... ...focus on scaling data‑hungry neural networks, we’re building AI that understands... ...who wants to build systems that work reliably, at scale, and in the real world....SeniorNetworkWork at officeShift work
- ...clinicians across hundreds of care sites nationwide – more than $10... ...Role We’re looking for a Senior Engineering Manager to lead the Frontend... ..., low-latency, high-reliability product used by clinicians during... ..., and offline/poor-network behavior Establish patterns...SeniorNetworkWork at officeLocal area
$175k - $250k
...fast‑growing customer base of SaaS companies. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team ensures... ...comfortable working across infrastructure layers—from compute and networking to storage, databases, and app runtime environments Are...NetworkRemote work- The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure... ...to eliminate toil and scale ops. Work across compute, networking, storage, and sandboxed execution layers to tune...Network
- ...cloud-native systems. As a Staff Platform Engineer, you will play a critical role in... ...technical leadership role. You will own reliability for major platform domains, design scalable... ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a...Senior
$170k - $277k
Palo Alto Networks, Inc. is seeking a Senior Principal Backend Engineer in San Francisco, CA, to lead the backend development for industry-leading products like Cortex XSOAR. You will drive project lifecycles, collaborate across teams, and utilize skills in Python and cloud...SeniorNetwork- ...maintaining Microsoft 365 environments. Ideal candidates will have 10+ years in IT support and relevant certifications. Skills in networking, customer communication, and a familiarity with various IT tools are essential. The position promises a dynamic work environment...SeniorNetwork
$151k - $297k
The Team Platform Engineering is the department within SRE that is responsible for a range... ...cloud-provider Kubernetes infrastructure, networking, load balancing (including our public-... ...components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager...NetworkLocal areaImmediate startRemote workFlexible hoursShift work- Palo Alto Networks, Inc. is seeking an Infrastructure Engineer to build tooling that enhances developer velocity and ensures reliability across our engineering organization. You'll work with modern cloud-native technologies, tackling challenges related to the development...SeniorNetwork
$138k - $179k
...partner with a wide variety of other teams from infrastructure and engineering, to QA and business teams, so strong collaborative instincts... ...and take responsibility for achieving results. A global network of talented colleagues, who inspire, support, and share their...NetworkFlexible hours- ...behalf of one of our customers. She will pick the best candidates from Jack's network The next step is to speak to Jack. Job Title: Senior Platform and Infrastructure Engineer Company Description: Context - Lux Capital and General Catalyst backed AI startup...SeniorNetworkLive in
- Alembic, Inc. is looking for an experienced engineer to design and operate the global network of one of the world's fastest private supercomputers. The role demands strong skills in infrastructure engineering, network security, and automation for scalable operations. As...SeniorNetwork
$221k - $271k
WinsAbove is seeking a Senior Solutions Engineer based in San Francisco. The ideal candidate has extensive experience in technical sales and a... ...'s degree or equivalent, with a focus on web security and networking technologies. The position offers competitive salaries ranging...SeniorNetwork$175k - $240k
...evaluation. We're a fast-moving team looking for a systems / database engineer to help design, optimize, and harden our system. Within 6... ...cloud object storage is a plus. ~ Strong fundamentals in networking, OS concepts, and systems debugging. Compensation...SeniorNetworkWork at officeFlexible hours- Crusoe Energy Systems LLC is looking for a Senior Staff Network Automation Engineer to build intelligent automation systems for their extensive network... ...production automation, ensuring high scalability and reliability. The ideal candidate will have over 12 years of...SeniorNetwork
$163k - $191.5k
...within organizations, between brands, and across its premier global network of top-quality partners.****Hundreds of global innovators, from... ....*** **Work with a team of supportive and passionate software engineers.*** **Architect and implement systems that materialize our...SeniorNetworkWork at officeRemote workWork from homeWorldwideFlexible hoursNight shift- ...integrating our advanced airframe and engine technologies—which include... ...Astro Mechanica is seeking a Senior Flight Software Engineer to... ..., operating system, networking, and firmware. You will work... ...YOU’LL DO: Develop highly reliable autonomous software systems and...SeniorNetworkWork at officeFlexible hours
- ...on our team! Why Join Us: We’re seeking several Software Engineers with full stack (any mix of front end, backend, and database)... ...Familiarity with cross-browser compatibility, accessibility, browser networking, and browser APIs such as IndexedDB and WebSockets. ~...SeniorNetworkFull timeRemote workFlexible hours
- ...Job Description Job Description Looking for a Senior Forward Deployed AI Engineer to lead the deployment and customization of AI-powered solutions... ...healthcare or regulated environments Proficiency in Python Experience with machine learning and neural networks...SeniorNetworkImmediate start
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Network & Site Reliability Engineer. Be the first to apply!
- network engineer full time San Francisco, CA
- entry level network engineer San Francisco, CA
- network reliability engineer San Francisco, CA
- network implementation engineer San Francisco, CA
- network design engineer San Francisco, CA
- juniper network engineer San Francisco, CA
- principal network engineer San Francisco, CA
- network engineer night shift San Francisco, CA
- network consulting engineer San Francisco, CA
- network solutions engineer San Francisco, CA



