Senior AI Infra SRE: Remote GPU Clusters & Performance
Cortes 23
- Remote job
Cortes 23 in San Francisco is seeking a Senior Site Reliability Engineer to design and operate large-scale GPU infrastructure. This high-impact role requires deep expertise in distributed systems and a proactive approach to incident management. The successful candidate will ensure reliability and performance, serving as a key technical liaison for customers managing large-scale AI workloads. The position offers the opportunity to shape foundational AI infrastructure within a dynamic team. #J-18808-Ljbffr Cortes 23
$250k
...engineer to design and maintain large-scale GPU clusters for training and inference. The candidate should have over 7 years in SRE or DevOps, with strong skills in... ...Experience with observability stacks and high-performance computing is preferred. The role offers an...SeniorPerformance$272k - $431.25k
...Principal Ai And Ml Infra Software Engineer, Gpu Clusters We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join... ...for such initiatives. Monitor and optimize the performance of our infrastructure ensuring high availability, scalability...Performance$272k - $431.25k
NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California... ...the efficiency of AI/ML research on GPU Clusters. The role involves collaboration with... ...teams, monitoring infrastructure performance, and implementing improvements based on...Performance$190k - $225k
We're hiring a senior PM to own technical... ...bare metal or GPU cloud PM might... ...engineers at AI Cloud operators... ...Customer‑Facing Infra Experience: You... ...or high‑performance computing environments... ...and we have a remote‑first work culture... ..., 40M+ virtual clusters created since 2...Remote workSeniorPerformanceFlexible hours$168k - $258.75k
A leading AI technology company in Santa Clara is seeking a Senior Datacenter Technical Program Manager. In this role, you will drive the integration of cutting... ...candidate has 8+ years of experience in high-performance computing, excellent teamwork skills, and a background...Remote jobSeniorPerformance$125k - $250k
...Senior Account Executive- GPU/AI Infrastructure Senior Account Executive - GPU and AI Infrastructure Location: Remote Within the USA Compensation: $125k-$250k base... ...some of the largest GPU clusters globally, we deliver high-performance GPU solutions that remove...Remote workSeniorPerformanceTemporary workFlexible hours$250k
...a rapidly scaling AI cloud infrastructure... ...a next-generation GPU platform designed for... ...is looking for a Senior / Staff Site Reliability... ..., scalability, and performance of HPC and cloud... ...for GPU compute clusters Collaborate with... ...options Bonus Remote working option and...Remote workSeniorPerformancePermanent employment$202.5k - $247.5k
...localhost or running AI workloads in... .... About the Infra Platform Team... ...production load. SRE and DevOps... ...develop by using remote development tools... ...full Kubernetes cluster of the ngrok... ...Compensation Job Title Senior Software... ...around for performance conversations.Remote workSeniorPerformancePermanent employmentFull timeWork at officeLocal areaHome officeFlexible hours- NVIDIA Corporation is hiring a Performance Engineer to conduct in-depth performance characterization on multi-GPU and multi-node clusters. The ideal candidate will have experience with parallel programming, performance benchmarking, and understand computer system architecture...SeniorPerformance
$152k - $241.5k
...two decades. Our invention of the GPU in 1999 sparked the growth of the... ...GPU deep learning ignited modern AI - the next era of computing.... ...implementation of ground breaking GPU compute clusters that run demanding deep learning, high performance computing, and computationally...Remote workSeniorPerformance- ...Hpc-Ai Engineer NVIDIA is looking for an experienced HPC-AI... ...Engineer to join the Networking Clusters Solutions Infrastructure team.... ...artificial intelligence and GPU computing. Provide insights on... ...develop and bring up large scale performance platforms. What you will be...Remote workSeniorPerformance
$152k - $241.5k
...in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual... ...on large multi-GPU and multi-node clusters. Study the interaction of our... ...existing vacancy. NVIDIA uses AI tools in its recruiting processes....Remote workSeniorPerformance$168k - $258.75k
Senior Datacenter Technical Program Manager, At-Scale AI Clusters page is loaded## Senior Datacenter Technical... ...Santa Clara: US, CA, Remote: US, Remotetime type:... ...and deploy large scale GPU computing systems based... ...Experience with high-performance computing systems and...Remote workSeniorPerformanceFor contractors$165k - $225k
...Moonlite delivers high-performance AI infrastructure for organizations... ...production-grade clusters from the ground up (not... ...for high-performance GPU interconnects, multi-... ...Experience: 5+ years in SRE, DevOps, or infrastructure... ...and success as we grow together. #li-remote...Remote workSeniorPerformanceFlexible hours$139k - $204k
...Senior Software Engineer, Cluster Orchestration CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave... ...infrastructure performance with deep technical... ...across massive GPU clusters. By building... ...work environment, remote work may be considered...Remote workSeniorPerformancePermanent employmentTemporary workCasual workWork at officeFlexible hours$180k - $300k
...the forefront of the AI revolution, offering an... ...including large-scale GPU clusters, cloud platforms, tools... ...Candidate Location: Remote U.S. Their mission is... ...We are seeking a Senior AI/ML Specialist Solutions... ...that maximize performance and business value Lead...Remote workSeniorPerformanceFull timeTemporary workLocal areaFlexible hours- ...company is seeking to enhance its enterprise AI mission systems by hiring a specialized... ...focused on designing and optimizing GPU clusters. In this role, you will be responsible for... ...clearance. Knowledge of Kubernetes and performance monitoring tools is highly desirable. #J...SeniorPerformance
$131k - $175k
...Senior Hardware Systems Engineer – AI Rack & Cluster Infrastructure Arista Networks is an industry... ...standards of quality and performance in everything we do.... ...cooling into high-density GPU environments, ensuring performance... ...to manage and work with remote vendors and integration...Remote workSeniorPerformanceFlexible hours- Krämer IT Solutions GmbH sucht einen AI Engineer / DevOps für unsere Saar-Cloud in Deutschland. Du baust den Maschinenraum für die KI von morgen und optimierst unsere GPU-Cluster für bestmögliche Performance. Du hast Erfahrung mit Docker und Kubernetes, und deine Aufgaben...Remote jobPerformanceFlexible hours
- ...leading tech firm is seeking a talented Senior Staff Software Engineer to design and... ...Data Center Compute racks. This remote role requires expertise in GPU programming and LINUX driver development, with a focus on performance and efficiency. Candidates should have...Remote workSeniorPerformance
- ...Engineer at a pioneering AI company, you'll be the... ...-edge Kubernetes GPU clusters; ensure swift and effective... ...; collaborate with senior leaders both internally... ...integration into high-performance computing (HPC)... ...flexibility in terms of remote work. The US base salary...Remote workPerformanceFull timeFlexible hoursNight shiftWeekend work
- ...mission to democratize AI by breaking down the barriers... ...we offer an innovative GPU marketplace and AI... ...We're seeking a Senior Infrastructure Engineer... ...IPMI/Redfish, BMC-based remote management, PXE boot, and... ...Familiarity with high-performance networking technologies...Remote workSeniorPerformance
- ...Senior Networking Test Engineer We are looking for a Senior Networking Test Engineer... ...NVLink, Ethernet and InfiniBand-based AI clusters. Additionally, you will own complex issues... ...metrics and traces. Run Regression, Performance, Functional and Scale testing, analyze...Remote workSeniorPerformance
$152k - $241.5k
...Senior Firmware Engineer Do you excel at developing robust, secure... ...powers our next-generation GPU architectures. We are looking... ...in building robust, high-performance infrastructure for the future... ...May 26, 2026. NVIDIA uses AI tools in its recruiting processes...Remote workSeniorPerformance- ...Senior Software Engineer - Web Engine Team - Infra Join the team redefining how the world experiences design. Hey... ...track record of diagnosing production performance issues. You are comfortable... ...Other stuff to know We see AI as a powerful amplifier of creativity...Remote workSeniorPerformanceWork at office
- A healthcare technology company based in San Francisco is seeking an experienced Site Reliability Engineer (SRE) to ensure the reliability and performance of their systems. Candidates should have over 5 years of professional engineering experience, strong cloud environment...Remote jobSeniorPerformanceFlexible hours
$149.1k - $157.8k
...Tech Insights is hiring a Senior Site Reliability Engineer to build a foundation for an AI-first platform in the U.S. This senior... ...Candidates should have extensive SRE experience, AWS expertise, and a... ...or Engineering. The position is remote with occasional travel required,...Remote workSenior$184k - $287.5k
...are seeking an ambitious Senior Solutions Architect - AI Factory Deployment to join... ...benchmarks on Linux-based GPU clusters, using NCCL and collectives... ...AllReduce and AllToAll to improve performance and scalability. As... ...performance engineering, SRE, or systems performance...Remote workSeniorPerformance$184k - $287.5k
..., we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self... ...in Artificial Intelligence (AI) and High Performance Computing. Join our team and help develop groundbreaking...Remote workSeniorPerformance$184k - $287.5k
...upon which every new AI-powered application... .... We are seeking a Senior Software Engineer... ...improve reliability, performance, and scale across... ...for NIMs, including GPU scheduling, autoscaling, and multi-cluster rollouts.... ...research, backend, SRE, and product teams...Remote workSeniorPerformance
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Infra SRE: Remote GPU Clusters & Performance. Be the first to apply!
- senior development executive San Francisco, CA
- senior technical manager San Francisco, CA
- senior manager data science San Francisco, CA
- senior platform engineer San Francisco, CA
- senior procurement San Francisco, CA
- senior director product management San Francisco, CA
- senior cost manager San Francisco, CA
- senior compliance officer San Francisco, CA
- senior tax director San Francisco, CA
- senior electronic design engineer San Francisco, CA


