Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior/Staff Infrastructure Engineer San Francisco

$180k - $250k

Fal

You are a hands-on engineer who builds the software and processes that keep a large fleet of GPU servers healthy and productive. You write systems and tooling for managing 1000s of servers including provisioning, health monitoring, error detection, and recovery — and when something breaks that automation can’t fix, you drive resolution with partners. Key responsibilities Build and maintain Python fleet tracking system that manages the full lifecycle of servers including contracting and procurement, target use, pricing, availability, health, RMAs, etc Build server management tooling that automates provisioning, health checks, GPU diagnostics, recovery and alerting Create and maintain metrics, dashboards, and alerting for hardware health across the fleet (GPU errors, disk failures, network issues, thermals) Leverage AI to an extreme level to build tools and automate alerting and recovery Implement and enforce OS-level security: hardening baselines, SELinux/AppArmor policies, SSH key management, vulnerability scanning, and compliance automation Manage and optimize distributed and local storage systems supporting model weights, checkpoints, and ephemeral scratch: NVMe arrays, NFS, parallel file systems, and object storage Tune Linux systems for AI workloads: kernel parameters, NUMA topology, CPU pinning, hugepages, I/O schedulers, and GPU driver stack optimization (NVIDIA drivers, CUDA, container runtimes) Develop a suite of automated error detection and recovery processes Work with partners to solve technical issues Requirements 5+ years experience managing bare-metal and VM server fleets at scale (100+ nodes) Strong software engineering skills in Python; you write production tooling, not scripts Deep Linux systems knowledge: boot process, kernel tuning, networking, storage, systemd, cgroups, namespaces, performance profiling Strong experience with configuration management and infrastructure-as-code: Ansible, Terraform, cloud-init Solid understanding of storage technologies: LVM, RAID, NVMe, NFS, Lustre or GPFS, and Linux I/O stack tuning Familiarity with hardware diagnostics and failure modes (GPUs, NVMe, NICs, memory) Experience building internal tools or dashboards for infrastructure visibility Excellent communication and ability to drive technical decisions across teams Self-starter who executes quickly, takes ownership, and constantly seeks improvement Nice to have Familiarity with network configuration and diagnostics (VLAN, VXLAN, ECMP, BGP, tcpdump) Experience with NVIDIA GPU infrastructure: driver management, health monitoring, DCGM, NVLink/NVSwitch diagnostics, RDMA, InfiniBand/RoCEv2 Experience with AMD GPUs Experience with bare metal and VM provisioning (PXE/iPXE, Kickstart, libvirt, Qemu/KVM) Experience with compliance frameworks relevant to cloud providers (SOC 2, ISO 27001) Compensation $180,000-250,000 plus equity + benefits Location What we offer at fal Interesting and challenging work A lot of learning and growth opportunities We are currently hiring in downtown San Francisco. We offer visa sponsorship and will help you relocate to San Francisco. Health, dental, and vision insurance (US) #J-18808-Ljbffr

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior/Staff Infrastructure Engineer San Francisco in San Francisco, CA vacancy
  •  ...What you’ll do As a Senior / Staff Network Engineer, you will define the global technical strategy,...  ...Airwallex’s enterprise and cloud network infrastructure. You will design and deploy highly...  ...Singapore, Sydney, Melbourne or San Francisco. Responsibilities: Build the... 
    Senior
    Flexible hours
    Weekend work

    Airwallex-

    San Francisco, CA
    2 days ago
  • $180k - $250k

     ...we offer at fal Interesting and challenging work A lot of learning and growth opportunities We are currently hiring in downtown San Francisco. We offer visa sponsorship and will help you relocate to San Francisco. Health, dental, and vision insurance (US) #J-18808-Ljbffr... 
    Senior
    Currently hiring
    Relocation
    Visa sponsorship

    Fal

    San Francisco, CA
    2 days ago
  • $232k - $290k

     ...Please note this is for San Francisco, CA, United States. You only need to apply to one location if there are multiple listed...  ...opportunities, join us, and build real world value. THE WORK As a Senior Staff Security Engineer focused on AI Security, you will be Ripple's deepest... 
    Senior
    Full time
    Work at office
    Local area

    Ripple

    San Francisco, CA
    2 days ago
  • $145k - $186k

     ...Staff Engineer, Cloud Engineering | Phoenix, AZ or San Francisco, CA (Hybrid, 3 days/week) A leading FinTech is in the final stretch of a multi-year cloud...  ...scalable, secure, and cost-effective AWS infrastructure (EC2, EKS, Lambda, RDS, S3, IAM) • Driving Infrastructure... 
    Suggested
    Work at office
    Flexible hours
    3 days per week

    Motion Recruitment

    San Francisco, CA
    5 days ago
  • $170k - $220k

     ...Staff Cloud Engineer / AWS / Hybrid in San Francisco San Francisco, California Hybrid Full Time $170k - $220k A leading financial technology...  ...-critical payments platform from on-premise infrastructure to AWS. This is a high-impact initiative tied directly... 
    Suggested
    Full time
    3 days per week

    Motion Recruitment

    San Francisco, CA
    7 days ago
  • $204k - $233k

     ...Staff DevOps Engineer San Francisco, CA (Hybrid) | Full-Time We're partnering with a well-capitalized infrastructure technology company building the foundation for the next generation of transportation. This organization sits at the intersection of energy... 
    Full time
    Local area

    Motion Recruitment

    San Francisco, CA
    6 days ago
  •  ..., a self-hostable inference engine for pre-trained models under...  ...and growing. Headquartered in San Francisco, backed by Index Ventures...  ...images, model caching, eval infrastructure. Today we deploy to AWS and...  ...billions of tokens per week). Senior ICs own a requirement end‑to... 
    Senior
    Remote work

    Superlinked, Inc.

    San Francisco, CA
    3 days ago
  • $180k - $286k

     ...Senior Software Engineer, AI Platform and Enablement About the Role We’re building a next‑...  ...Design, implement, and maintain our AI infrastructure supporting our machine learning life...  ...located in the Mission District of San Francisco, CA. We’re hiring for a mix of... 
    Senior
    Work at office
    Remote work
    Flexible hours

    Descript

    San Francisco, CA
    2 days ago
  • $202.5k - $247.5k

     ...Software Engineer III/Senior, Infra Platform About ngrok Inc. ngrok is an all-in-one...  ...operate ngrok itself. We think about infrastructure the way software engineers think about...  ...within commuting distance to San Francisco. Our Bay Area employees commute to the... 
    Senior
    Permanent employment
    Full time
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours

    ngrok, Inc.

    San Francisco, CA
    2 days ago
  • $207k - $345k

     ...Senior Engineering Manager - Payroll Platform Rippling gives businesses one place to run HR,...  ...365—all within 90 seconds. Based in San Francisco, CA, Rippling has raised $1.4B+ from...  ...leader. You will focus on systems design, infrastructure reliability, and backend efficiency... 
    Senior
    Work at office
    3 days per week

    Rippling

    San Francisco, CA
    2 days ago
  •  ...Senior Cloud Data Operations Engineer Responsibilities Support/Operate an Enterprise Data Services Platform (RedShift/EMR/OpenSearch Service...  ...in Scala & Java This is a long-term contract in San Francisco (hybrid/remote). MUST be a US citizen, or at least... 
    Senior
    Long term contract
    Contract work
    Remote work

    LABINE AND ASSOCIATES, INC.

    San Francisco, CA
    5 days ago
  • $117.2k - $229.2k

    Senior Software Engineer - Azure Object Storage job at Microsoft Corporation. San Francisco, CA. Azure Object Storage team is looking for a talented and highly motivated Senior Software Engineer to design and develop the next generation of our object storage stack. We are... 
    Senior
    Local area

    Itlearn360

    San Francisco, CA
    4 days ago
  •  ...A leading AI infrastructure company is seeking a Staff Infrastructure Engineer in San Francisco. In this role, you will own the systems that power the company at scale, focusing on reliability, scalability, and developer velocity. You will be responsible for designing... 
    Senior
    Work at office

    Salient

    San Francisco, CA
    2 days ago
  • Junior Network Engineer job at Revel Staffing. San Francisco, CA. Key Responsibilities Firewall Operations & Security...  ...and DataClear standards. Assist senior engineers with firewall changes...  ...‑critical environment. Network Infrastructure Support Collaborate with network... 
    Work experience placement

    Revel Staffing

    San Francisco, CA
    3 days ago
  • $160k - $300k

     ...our mission is to revolutionize how engineering decisions are made, turning...  ...together. About the Role As a Senior / Staff Infrastructure Engineer at Apiphany, you’ll design...  ...Sponsorship ~ Hybrid work: 3 days in San Francisco office ~401(k) plan ~ Medical,... 
    Senior
    Work at office
    Visa sponsorship
    Flexible hours

    APIphany

    San Francisco, CA
    2 days ago
  • $160k - $210k

    Zip in San Francisco is seeking an experienced Tech Lead to oversee the Core Infrastructure team. The role involves managing Zip's Kubernetes platform and collaborating with...  ...candidate has over 6 years of software engineering experience in infrastructure, with a focus... 
    Senior

    Zip

    San Francisco, CA
    3 days ago
  • $102.5k - $188.9k

    Cyber Oracle Cloud Security - Senior Consultant job at Deloitte. San Francisco, CA. Our Deloitte Cyber team understands...  ...Security, Information Security, Engineering, Information Technology,...  ...EPM) Experience with Oracle Cloud Infrastructure (OCI) security Knowledge of Oracle... 
    Senior
    Visa sponsorship

    Hong Kong Study Skills Research Institute

    San Francisco, CA
    1 day ago
  •  ...PhDs, creatives, technologists, and engineers working together to empower people...  ...in the Mission District in San Francisco, the SoHo neighborhood of New York...  ...experienced and highly motivated "Senior or Staff Security Infrastructure Engineer" to join our team as one... 
    Senior
    Hourly pay
    Full time
    Flexible hours

    Abridge

    San Francisco, CA
    5 days ago
  • $225k - $275k

    Crusoe Energy Systems LLC in San Francisco is looking for a Senior Staff Network Operations Engineer to ensure production reliability across its global network. In this role, you will lead incident response and define key operational standards. Ideal candidates will bring... 
    Senior

    Crusoe Energy Systems LLC

    San Francisco, CA
    5 days ago
  •  ...Patch Technologies, Inc, located in San Francisco, is hiring a Product Engineer to take ownership of the product development lifecycle. The role involves building workflows for environmental commodities, collaborating with cross-functional teams, and maintaining high engineering... 
    Senior

    Patch Technologies, Inc

    San Francisco, CA
    2 days ago
  • $200k - $275k

     ...A leading technology firm in San Francisco is seeking a Staff Software Engineer focused on Product Security. This role involves building secure frameworks, resolving security risks, and collaborating with teams to ensure best practices in security. The ideal candidate... 
    Senior

    Peregrine Technologies

    San Francisco, CA
    2 days ago
  • $250k - $350k

     ...Senior Software Engineer - Infrastructure Platform - San Francisco, CA - $250K-$350K Location: San Francisco, CA Work Arrangement: Onsite Overview: We're seeking a Senior Software Engineer to help build and scale the core infrastructure powering... 
    Senior
    Full time
    Visa sponsorship
    Relocation package

    Direct Line Workforce Solutions

    San Francisco, CA
    6 days ago
  • $200k - $240k

     ...secure world for all. The AI Engineering Team is chartered with...  ...pipelines, high-performance infrastructure, and operational tooling...  ...faster than the market. As a Senior or Staff AI Infrastructure Engineer...  ...others. Headquartered in San Francisco, TRM operates as a... 
    Senior
    Remote work
    Worldwide

    TRM Labs

    San Francisco, CA
    2 days ago
  • A leading tech company in San Francisco seeks a Senior Staff Engineer to architect and lead the payroll platform. The role involves setting technical strategies and mentoring engineers to ensure robust and scalable solutions in a high-growth environment. Candidates should... 
    Senior

    Rippling

    San Francisco, CA
    5 days ago
  • Airbnb, Inc. is looking for a Senior Technical Individual Contributor to define and execute the long-term vision for the Trust Platform in San Francisco. With over 12 years of experience in backend and platform engineering, you will drive strategic architectural decisions... 
    Senior

    airbnb, Inc.

    San Francisco, CA
    3 days ago
  • Epoch Biodesign in San Francisco is seeking a Senior Staff Cloud Support Engineer to lead technical escalations and improve cloud infrastructure. You will mentor engineers and influence architectural decisions while ensuring high availability for AI workloads. The ideal... 
    Senior

    Epoch Biodesign

    San Francisco, CA
    5 days ago
  • A fast-growing AI company in San Francisco is seeking a Senior/Staff Infrastructure Engineer to build and operate cloud infrastructure. This full-time, hybrid role focuses on GCP, Kubernetes, and infrastructure-as-code. You will be responsible for securing deployments and... 
    Senior
    Full time

    Motion Recruitment Partners LLC

    San Francisco, CA
    4 days ago
  • $185k - $275k

     ...passionate, skilled, and experienced Cloud Infrastructure Engineer to help architect, build, and operate...  ...5k per year. Preferred locations: San Francisco Bay Area or Seattle. We provide...  ..., base pay varies based on location, seniority, skills, and experience. Wherobots... 
    Senior
    Full time
    Work at office
    Remote work
    Work visa
    Shift work

    Wherobots

    San Francisco, CA
    2 days ago
  • OpenAI is looking for a Backend Engineer to join the Codex for Finance team in San Francisco. In this role, you will own the end-to-end development lifecycle for new platform capabilities, working closely with product and research teams. Ideal candidates will have 7+ years... 
    Senior

    OpenAI

    San Francisco, CA
    3 days ago
  • $207k - $362.25k

     ...like Slack and Microsoft 365—all within 90 seconds. Based in San Francisco, CA, Rippling has raised $1.4B+ from the world’s top...  ...sent from @Rippling.com addresses. About the role As the Senior Staff Engineer for the Payroll Platform team, you will be the lead architect... 
    Senior
    Work at office
    3 days per week

    Rippling

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior/Staff Infrastructure Engineer San Francisco. Be the first to apply!