Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Infra SRE: Remote GPU Clusters & Performance

Cortes 23

Cortes 23 in San Francisco is seeking a Senior Site Reliability Engineer to design and operate large-scale GPU infrastructure. This high-impact role requires deep expertise in distributed systems and a proactive approach to incident management. The successful candidate will ensure reliability and performance, serving as a key technical liaison for customers managing large-scale AI workloads. The position offers the opportunity to shape foundational AI infrastructure within a dynamic team. #J-18808-Ljbffr Cortes 23

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior AI Infra SRE: Remote GPU Clusters & Performance in San Francisco, CA vacancy
  • $250k

     ...engineer to design and maintain large-scale GPU clusters for training and inference. The candidate should have over 7 years in SRE or DevOps, with strong skills in...  ...Experience with observability stacks and high-performance computing is preferred. The role offers an... 
    Senior
    Performance

    Hamilton Barnes Associates Limited

    San Francisco, CA
    16 hours ago
  • $272k - $431.25k

     ...Principal Ai And Ml Infra Software Engineer, Gpu Clusters We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join...  ...for such initiatives. Monitor and optimize the performance of our infrastructure ensuring high availability, scalability... 
    Performance

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $272k - $431.25k

    NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California...  ...the efficiency of AI/ML research on GPU Clusters. The role involves collaboration with...  ...teams, monitoring infrastructure performance, and implementing improvements based on... 
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $190k - $225k

    We're hiring a senior PM to own technical...  ...bare metal or GPU cloud PM might...  ...engineers at AI Cloud operators...  ...Customer‑Facing Infra Experience: You...  ...or high‑performance computing environments...  ...and we have a remote‑first work culture...  ..., 40M+ virtual clusters created since 2... 
    Remote work
    Senior
    Performance
    Flexible hours

    vCluster

    Dallas, TX
    4 hours ago
  • $168k - $258.75k

    A leading AI technology company in Santa Clara is seeking a Senior Datacenter Technical Program Manager. In this role, you will drive the integration of cutting...  ...candidate has 8+ years of experience in high-performance computing, excellent teamwork skills, and a background... 
    Remote job
    Senior
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $125k - $250k

     ...Senior Account Executive- GPU/AI Infrastructure Senior Account Executive - GPU and AI Infrastructure Location: Remote Within the USA Compensation: $125k-$250k base...  ...some of the largest GPU clusters globally, we deliver high-performance GPU solutions that remove... 
    Remote work
    Senior
    Performance
    Temporary work
    Flexible hours

    ESR Healthcare

    United States
    20 hours ago
  • $250k

     ...a rapidly scaling AI cloud infrastructure...  ...a next-generation GPU platform designed for...  ...is looking for a Senior / Staff Site Reliability...  ..., scalability, and performance of HPC and cloud...  ...for GPU compute clusters Collaborate with...  ...options Bonus  Remote working option and... 
    Remote work
    Senior
    Performance
    Permanent employment
    San Francisco, CA
    5 days ago
  • $202.5k - $247.5k

     ...localhost or running AI workloads in...  .... About the Infra Platform Team...  ...production load. SRE and DevOps...  ...develop by using remote development tools...  ...full Kubernetes cluster of the ngrok...  ...Compensation Job Title Senior Software...  ...around for performance conversations.
    Remote work
    Senior
    Performance
    Permanent employment
    Full time
    Work at office
    Local area
    Home office
    Flexible hours

    ngrok Inc.

    San Francisco, CA
    16 hours ago
  • NVIDIA Corporation is hiring a Performance Engineer to conduct in-depth performance characterization on multi-GPU and multi-node clusters. The ideal candidate will have experience with parallel programming, performance benchmarking, and understand computer system architecture... 
    Senior
    Performance

    NVIDIA Corporation

    New Bremen, OH
    3 days ago
  • $152k - $241.5k

     ...two decades. Our invention of the GPU in 1999 sparked the growth of the...  ...GPU deep learning ignited modern AI - the next era of computing....  ...implementation of ground breaking GPU compute clusters that run demanding deep learning, high performance computing, and computationally... 
    Remote work
    Senior
    Performance

    NVIDIA

    United States
    5 days ago
  •  ...Hpc-Ai Engineer NVIDIA is looking for an experienced HPC-AI...  ...Engineer to join the Networking Clusters Solutions Infrastructure team....  ...artificial intelligence and GPU computing. Provide insights on...  ...develop and bring up large scale performance platforms. What you will be... 
    Remote work
    Senior
    Performance

    NVIDIA

    United States
    16 hours ago
  • $152k - $241.5k

     ...in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual...  ...on large multi-GPU and multi-node clusters. Study the interaction of our...  ...existing vacancy. NVIDIA uses AI tools in its recruiting processes.... 
    Remote work
    Senior
    Performance

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $168k - $258.75k

    Senior Datacenter Technical Program Manager, At-Scale AI Clusters page is loaded## Senior Datacenter Technical...  ...Santa Clara: US, CA, Remote: US, Remotetime type:...  ...and deploy large scale GPU computing systems based...  ...Experience with high-performance computing systems and... 
    Remote work
    Senior
    Performance
    For contractors

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • $165k - $225k

     ...Moonlite delivers high-performance AI infrastructure for organizations...  ...production-grade clusters from the ground up (not...  ...for high-performance GPU interconnects, multi-...  ...Experience: 5+ years in SRE, DevOps, or infrastructure...  ...and success as we grow together. #li-remote... 
    Remote work
    Senior
    Performance
    Flexible hours

    Moonlite

    Chicago, IL
    10 days ago
  • $139k - $204k

     ...Senior Software Engineer, Cluster Orchestration CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave...  ...infrastructure performance with deep technical...  ...across massive GPU clusters. By building...  ...work environment, remote work may be considered... 
    Remote work
    Senior
    Performance
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    1 day ago
  • $180k - $300k

     ...the forefront of the AI revolution, offering an...  ...including large-scale GPU clusters, cloud platforms, tools...  ...Candidate Location: Remote U.S. Their mission is...  ...We are seeking a Senior AI/ML Specialist Solutions...  ...that maximize performance and business value Lead... 
    Remote work
    Senior
    Performance
    Full time
    Temporary work
    Local area
    Flexible hours

    TieTalent

    Hardwick, VT
    1 day ago
  •  ...company is seeking to enhance its enterprise AI mission systems by hiring a specialized...  ...focused on designing and optimizing GPU clusters. In this role, you will be responsible for...  ...clearance. Knowledge of Kubernetes and performance monitoring tools is highly desirable. #J... 
    Senior
    Performance

    RPMGlobal

    Bethesda, MD
    16 hours ago
  • $131k - $175k

     ...Senior Hardware Systems Engineer – AI Rack & Cluster Infrastructure Arista Networks is an industry...  ...standards of quality and performance in everything we do....  ...cooling into high-density GPU environments, ensuring performance...  ...to manage and work with remote vendors and integration... 
    Remote work
    Senior
    Performance
    Flexible hours

    Arista Networks, Inc.

    Santa Clara, CA
    3 days ago
  • Krämer IT Solutions GmbH sucht einen AI Engineer / DevOps für unsere Saar-Cloud in Deutschland. Du baust den Maschinenraum für die KI von morgen und optimierst unsere GPU-Cluster für bestmögliche Performance. Du hast Erfahrung mit Docker und Kubernetes, und deine Aufgaben... 
    Remote job
    Performance
    Flexible hours

    Server Eye

    New Bremen, OH
    3 days ago
  •  ...leading tech firm is seeking a talented Senior Staff Software Engineer to design and...  ...Data Center Compute racks. This remote role requires expertise in GPU programming and LINUX driver development, with a focus on performance and efficiency. Candidates should have... 
    Remote work
    Senior
    Performance

    Confidential Company

    Richardson, TX
    4 days ago
  •  ...Engineer at a pioneering AI company, you'll be the...  ...-edge Kubernetes GPU clusters; ensure swift and effective...  ...; collaborate with senior leaders both internally...  ...integration into high-performance computing (HPC)...  ...flexibility in terms of remote work. The US base salary... 
    Remote work
    Performance
    Full time
    Flexible hours
    Night shift
    Weekend work

    Together AI

    San Francisco, CA
    3 days ago
  •  ...mission to democratize AI by breaking down the barriers...  ...we offer an innovative GPU marketplace and AI...  ...We're seeking a Senior Infrastructure Engineer...  ...IPMI/Redfish, BMC-based remote management, PXE boot, and...  ...Familiarity with high-performance networking technologies... 
    Remote work
    Senior
    Performance

    Hyperbolic Labs

    San Francisco, CA
    2 days ago
  •  ...Senior Networking Test Engineer We are looking for a Senior Networking Test Engineer...  ...NVLink, Ethernet and InfiniBand-based AI clusters. Additionally, you will own complex issues...  ...metrics and traces. Run Regression, Performance, Functional and Scale testing, analyze... 
    Remote work
    Senior
    Performance

    NVIDIA

    United States
    3 days ago
  • $152k - $241.5k

     ...Senior Firmware Engineer Do you excel at developing robust, secure...  ...powers our next-generation GPU architectures. We are looking...  ...in building robust, high-performance infrastructure for the future...  ...May 26, 2026. NVIDIA uses AI tools in its recruiting processes... 
    Remote work
    Senior
    Performance

    NVIDIA

    United States
    5 days ago
  •  ...Senior Software Engineer - Web Engine Team - Infra Join the team redefining how the world experiences design. Hey...  ...track record of diagnosing production performance issues. You are comfortable...  ...Other stuff to know We see AI as a powerful amplifier of creativity... 
    Remote work
    Senior
    Performance
    Work at office

    Canva

    United States
    3 days ago
  • A healthcare technology company based in San Francisco is seeking an experienced Site Reliability Engineer (SRE) to ensure the reliability and performance of their systems. Candidates should have over 5 years of professional engineering experience, strong cloud environment... 
    Remote job
    Senior
    Performance
    Flexible hours

    Plenful

    San Francisco, CA
    3 days ago
  • $149.1k - $157.8k

     ...Tech Insights is hiring a Senior Site Reliability Engineer to build a foundation for an AI-first platform in the U.S. This senior...  ...Candidates should have extensive SRE experience, AWS expertise, and a...  ...or Engineering. The position is remote with occasional travel required,... 
    Remote work
    Senior

    Tech Insights

    New York, NY
    2 days ago
  • $184k - $287.5k

     ...are seeking an ambitious Senior Solutions Architect - AI Factory Deployment to join...  ...benchmarks on Linux-based GPU clusters, using NCCL and collectives...  ...AllReduce and AllToAll to improve performance and scalability. As...  ...performance engineering, SRE, or systems performance... 
    Remote work
    Senior
    Performance

    NVIDIA

    United States
    5 days ago
  • $184k - $287.5k

     ..., we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self...  ...in Artificial Intelligence (AI) and High Performance Computing. Join our team and help develop groundbreaking... 
    Remote work
    Senior
    Performance

    NVIDIA

    United States
    2 days ago
  • $184k - $287.5k

     ...upon which every new AI-powered application...  .... We are seeking a Senior Software Engineer...  ...improve reliability, performance, and scale across...  ...for NIMs, including GPU scheduling, autoscaling, and multi-cluster rollouts....  ...research, backend, SRE, and product teams... 
    Remote work
    Senior
    Performance

    NVIDIA

    United States
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Infra SRE: Remote GPU Clusters & Performance. Be the first to apply!