AI Infra & Cluster Engineer — Scale GPU/CPU Orchestration
Linuxcareers
Linuxcareers is seeking an Infrastructure/Cluster Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse hardware architectures and building robust infrastructure. The ideal candidate will possess deep expertise in Linux systems, automation tools, and orchestration technologies. Responsibilities include debugging performance issues and designing observability systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr Linuxcareers
- A cutting-edge tech company in San Francisco seeks infrastructure engineers to enhance the tooling and systems that power its AI applications. Responsibilities include building GPU orchestration, scaling cloud batchjob systems, and designing efficient scheduling software...SuggestedVisa sponsorship
$300k
Albert Bow is seeking a Founding Engineer to design and scale their distributed systems for autonomous AI agents. With a salary of up to $300,000 and equity, you will have the opportunity to join an experienced founding team at a rapidly growing venture-backed AI startup...Suggested- Sciforium is looking for a Senior HPC & GPU Infrastructure Engineer to oversee our GPU compute cluster’s health, reliability, and performance. This role involves hands-on Linux systems engineering, GPU driver management, and maintaining machine learning software stacks...SuggestedFlexible hours
- A leading AI technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation models. You will design... ...training systems and optimize GPU utilization while collaborating with...Suggested
- ...A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal...Suggested
- ...innovative company is seeking a talented software engineer to join their dynamic Inference team. This... ...and implementing infrastructure for large-scale multimodal models, focusing on high-... ...product teams to push the boundaries of AI technology, ensuring reliable production services...
- Nooks in San Francisco is seeking a Senior Engineer to build infrastructure that enhances the efficiency of multiple product teams. The... ...engineering experience, particularly in distributed systems and scaling production environments. Candidates should be comfortable...Work at office3 days per week
- A high-growth AI startup in San Francisco is seeking a Software Engineer (Infrastructure) to design and scale Kubernetes systems for a rapidly expanding platform. You will be responsible for leading technical deployments for enterprise clients and developing secure execution...
$250k
...opportunities? Join a rapidly scaling AI cloud infrastructure provider building a next-generation GPU platform designed for AI... ...Senior / Staff Site Reliability Engineer to support and scale large-scale... ...monitoring frameworks for GPU compute clusters Collaborate with ML, data,...Permanent employmentRemote work$190k - $270k
AI Chopping Block, Inc. is seeking an experienced AI Infrastructure Engineer to manage user-facing services and production systems. The role encompasses participating in on-call rotations, building infrastructure with tools like Ansible, Terraform, and Kubernetes, and...Full timeInternship$180k - $250k
A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years of...$180k - $250k
...next generation of AI products. We build... ...production, and do it at scale without compromise.... ...inference, orchestration, and observability... ...experienced software engineer who thrives on building... ..., scheduling, GPU autoscaling, large... ...and tune low level CPU and memory performance...Currently hiringRemote workRelocation package$170k - $230k
...Site Reliability Engineer (SRE) Palo Alto... ...Mithril is an AI infrastructure platform... ...platform built to make GPU compute more... ...shape how Mithril scales its platform across... ...Mithril's global GPU orchestration platform. This is... ...managing clusters, deployments, and...Work at officeLocal area1 day per week$120k - $290k
Somi AI is looking for a Software Engineer to join their team in San Francisco. In this role, you will design and build systems that provision and scale Neki clusters, ensuring high availability and data protection. The ideal candidate will have 5+ years of software engineering...- A leading AI research company in San Francisco is seeking engineers to operate next-gen compute clusters. The role requires scaling Kubernetes, automating infrastructure, and ensuring system reliability. Ideal candidates have strong Kubernetes and scripting skills with...
$200k - $400k
Inferact is seeking a dedicated cluster administration engineer to manage high-performance GPU compute infrastructure in San Francisco. This hands-on role focuses on optimizing system health and availability for engineering productivity. Ideal candidates will have substantial...Remote job$300k
Aionia Group in San Francisco is seeking a Systems Infrastructure Engineer to build scalable infrastructure for RL experiments. This role... ...on innovative projects with leading researchers in a well-funded AI company. The ideal candidate has over 2 years of experience in...- ...the world's most dynamic AI companies, like Cursor,... ...build the platform engineers turn to to ship AI products... ...multi‑modal workloads scale, the network is the... ...engineers to lead our GPU Networking efforts, making... ...performance on bleeding‑edge clusters (H100/H200, B200/B300,...Flexible hours
$300k
...building out their AI and cloud platform... ..., full-scale model training, or... ...inference. As a Platform Engineer/Senior Site Reliability... ...of this GPU-powered infrastructure... ...ensuring seamless orchestration across environments... ...of the largest GPU clusters in private deployment...- Senior Site Reliability Engineer - AI Infrastructure... ...About Andromeda Andromeda Cluster was founded by Nat Friedman... ...access to the kind of scaled AI infrastructure once... ...systems, network, and orchestration layer that makes the... ...and debug large‑scale GPU infrastructure used...Full timeRemote work
- Site Reliability Engineer - AI Infrastructure Location: Global... ...Andromeda Andromeda Cluster was founded by Nat... ...access to the kind of scaled AI infrastructure once... ...systems, network, and orchestration layer that makes the world... .../AI infrastructure or GPU-based systems (CUDA,...Full timeRemote work
$179k - $218k
...the only vertically integrated AI infrastructure company built... ...urgency, who believe in the scale of our ambition and thrive on... ...Staff Data Center Operations Engineer, GPU Hardware Architecture to be the... ...needed to maintain peak cluster health. The Strategic Bridge...Temporary work- Mistral in San Francisco is seeking a Systems Engineer/System Administrator to manage and scale its AI infrastructure. This hybrid role demands skills in Linux... ...in systems administration and experience with HPC clusters or cloud infrastructure. Join Mistral for a high-impact...
$172.5k - $210k
A cutting-edge AI infrastructure firm located in San Francisco is seeking a Senior Systems Performance Engineer. This role involves leading hardware evaluations and optimizing AI systems for performance. Candidates should have over 5 years of experience, proficiency in...- A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...
- Senior Infrastructure Engineer - Bland As a Senior Infrastructure... ...anticipating and solving scaling challenges related to... ...industries. Lead - AI/ML Stack Infrastructure Lead... ...operating production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing...Temporary work
$300k
...Join a seed-stage AI infrastructure company building large-scale training and inference platforms... ...with a single managed GPU cluster that reached capacity... ..., networking, and orchestration. You lead technical... ...with both executives and engineers, and help create a repeatable...Permanent employmentImmediate start$250k - $400k
A leading AI research firm in San Francisco seeks experienced professionals to build and scale systems for AI-driven scientific discovery. The role involves developing... ...base plus equity, with opportunities for ML Engineers, ML Infra, Research Engineers, and Research...Remote job$335k
OpenAI in San Francisco seeks a System Engineer to architect and operationalize essential infrastructure for AI systems. The role demands 7+ years in systems engineering... ...experience debugging and a solid grasp of clustering and scaling in production environments. Offers a hybrid...Relocation package- ...history. When people finance GPU clusters, the datacenters housing... ...to the market? Otherwise, as AI scales, compute only becomes available... ...metal servers with our VM orchestration software all the way to coordinating... ...assembly Understanding of CPU interrupts Networking...Long term contractContract workFixed term contractWork at officeLocal areaVisa sponsorshipShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Infra & Cluster Engineer — Scale GPU/CPU Orchestration. Be the first to apply!
- ai research engineer San Francisco, CA
- ai developer San Francisco, CA
- ai prompt engineer San Francisco, CA
- ai engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- machine learning ai engineer San Francisco, CA
- orchestra San Francisco, CA
- azure ai engineer


