Heterogeneous AI Infra & Cluster Engineer
Gimlet Labs
Gimlet Labs in San Francisco is seeking an Infrastructure Platform Engineer to design and operate complex clusters for AI inference. This hands-on role involves working with diverse hardware architectures and orchestration systems to create scalable infrastructure solutions. The ideal candidate has strong Linux and Kubernetes experience, coupled with automation and debugging skills. Help shape AI infrastructure systems positively impacting future workloads. #J-18808-Ljbffr Gimlet Labs
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Heterogeneous AI Infra & Cluster Engineer in San Francisco, CA vacancy
- Linuxcareers is seeking an Infrastructure/Cluster Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse hardware architectures and building robust infrastructure. The ideal candidate will possess deep...Suggested
- ...interact with the web by building AI agents that can reliably do... ...Responsibilities: Scale infra for post-training of multimodal... ...Work closely with product engineers to translate cutting‑edge AI capabilities... ...with ML infrastructure (GPU clusters) and supporting networking (...SuggestedWork at officeRelocationVisa sponsorship
- Neura Market is seeking an HPC Engineer to build and configure large-scale HPC clusters for AI workloads. This role requires working 4 days a week onsite in San Francisco/Bellevue, where you will collaborate closely with teams to troubleshoot and improve systems. The ideal...Suggested
$293k
...are seeking a Tokens-as-a-Service (TaaS) Engineer to help build the systems that convert... ...analysis, and model porting across heterogeneous infrastructure environments. Build tooling... ...Preferred Skills Experience with GPU clusters, AI infrastructure, performance benchmarking...Suggested- Senior Infrastructure Engineer - Bland As a Senior Infrastructure Engineer at Bland, responsibilities... ...in regulated industries. Lead - AI/ML Stack Infrastructure Lead the team... ...designing and operating production Kubernetes clusters optimized for AI/ML workloads with GPU...SuggestedTemporary work
- ...accelerate the progress of AI applications out into... ...their laptop to the cluster without needing to be a... ...Senior Site Reliability Engineer to join the Infrastructure... ...laptop. As part of the Infra team, we build the... ...management systems for heterogeneous compute clusters Develop...
- Nohi, based in San Francisco, is seeking a Senior Full-Stack Engineer to lead the architecture and implementation of core systems such as products, orders, and payments. You will drive the end-to-end development, ensuring reliability across both frontend and backend. The...
- A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience...
$190k - $270k
AI Chopping Block, Inc. is seeking an AI Infrastructure Engineer in San Francisco. This role requires maintaining user-facing services and production systems, specializing in systems while ensuring their reliability and scalability. Candidates should have 5+ years of experience...$320k - $405k
...interpretable, and steerable AI systems. We want AI to be safe... ...group of committed researchers, engineers, policy experts, and business... ...Infrastructure Engineer, Node Infra About the role Anthropic's Infrastructure... ..., stand up and scale clusters from thousands to hundreds of...Visa sponsorship$250k
LeoForce is seeking a Senior Software Engineer in San Francisco to design agentic systems and improve AI-native workflows. This hybrid role allows flexibility with 2-3 days in the office each week. Ideal candidates will have 2-8 years of software engineering experience...Work at office$202.5k - $247.5k
...sharing localhost or running AI workloads in production. We’re... ...below worth your time. About the Infra Platform Team The Infra... ...team builds the systems ngrok engineers rely on to build, deploy, and... ...environments that run a full Kubernetes cluster of the ngrok stack, closely...Permanent employmentFull timeWork at officeLocal areaRemote workHome officeFlexible hours$335k
OpenAI in San Francisco seeks a System Engineer to architect and operationalize essential infrastructure for AI systems. The role demands 7+ years in systems engineering... ...experience debugging and a solid grasp of clustering and scaling in production environments. Offers...Relocation package- ...seeking an experienced Infrastructure Security Engineer to design and implement security in a... ...the long-term security roadmap for advanced AI systems. Your role includes auditing, hardening, and securing Kubernetes clusters, implementing centralized security controls,...
- ...A technology company specializing in AI infrastructure is seeking a Member of Technical... ...building compiler pipelines for heterogeneous hardware while ensuring performance and... ...ideal candidate should have solid software engineering skills and experience in compiler systems...
- A leading AI research organization in San Francisco is looking for a Senior Mechanical Infrastructure Engineer to develop reliable and efficient thermal systems for high-density AI data centers. The ideal candidate should have over 10 years of experience in mechanical...
- ...for the world's most dynamic AI companies, like Cursor, Notion... ...and help build the platform engineers turn to to ship AI products.... ...operating system for distributed, heterogeneous AI hardware. We believe that... ...performance on bleeding‑edge clusters (H100/H200, B200/B300, GB200/...Flexible hours
- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...Flexible hours
- Happyrobot Inc. is looking for an Infrastructure Engineer in San Francisco, California. This role involves leading the stability and observability... ...as familiarity with monitoring tools. Join us at a high-growth AI startup backed by top investors, where you will have ownership...
$190k - $270k
AI Chopping Block, Inc. is looking for an AI Infrastructure Engineer based in San Francisco, California. The role involves ensuring production systems run smoothly, building infrastructure with tools like Ansible and Kubernetes, and implementing operational processes....- AI Talent Now in San Francisco is looking for an engineer to take ownership of projects that innovate AI evaluation techniques. You will design and build scalable systems, collaborate with various teams, and ensure code quality through reviews and documentation. This role...
- Sr. Site Reliability Engineer Job type: Full Time · Department: Platform... ...) Optura is healthcare’s AI orchestration platform. We... ...operating it at scale across heterogeneous environments Multi‑cloud fluency... ...tooling (Replicated, Cluster API, Talos, Rancher, OpenShift...Full timeRemote work
$190k - $270k
AI Chopping Block, Inc. is looking for an AI Infrastructure Engineer to ensure smooth operation of user-facing services and production systems. You will handle infrastructure with tools like Ansible, Terraform, and Kubernetes while participating in on-call rotations for...$225k
Australia-Employment is seeking a Senior Software Engineer in San Francisco to design and implement AI-native workflows and infrastructure. This role offers a competitive salary between $225,000 and $450,000 per year. You will work on diverse challenges in a hybrid environment...- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to... ...). Own capacity planning across heterogeneous GPU fleets optimized for training throughput...Full timeRemote work
$150k
Tzafon is seeking a skilled engineer to enhance their machine intelligence systems in San Francisco. As part of the team, you'll be responsible for building evaluation infrastructure, designing data pipelines, and implementing fine-tuning processes. Ideal candidates have...- ...building the next hyperscaler for AI agents. About the role You... ...tightening the loop between infra change and production behavior... ...looking for an infrastructure engineer who actually wants to live in... ...- You've operated Kubernetes clusters past the tutorial stage: real...Live inWork from home
- ...Technical Staff to contribute to model training pipelines and produce state-of-the-art models. Candidates should possess strong software engineering skills, especially in Python and ML frameworks like JAX and Pytorch. Experience with distributed training infrastructures is...Remote job
- A cutting-edge AI infrastructure startup in San Francisco is seeking a Senior Multidisciplinary Engineer to innovate how physical structures are built. The ideal candidate will leverage deep engineering expertise across mechanical, electrical, and systems engineering to...
- CoffeeSpace, an AI infrastructure startup in San Francisco, is seeking a Senior Founding Engineer to shape backend systems and data evaluation for frontier AI. You will collaborate directly with founders and have impactful ownership in a fast-growing environment. The ideal...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Heterogeneous AI Infra & Cluster Engineer. Be the first to apply!
Related searches
- ai research engineer San Francisco, CA
- ai developer San Francisco, CA
- ai prompt engineer San Francisco, CA
- ai engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- machine learning ai engineer San Francisco, CA
- azure ai engineer
- ai research engineer


