HPC Operations Engineer — AI Cloud Infra (On-site 4d/wk)
Lambda Inc.
Lambda Inc. is seeking an experienced HPC Engineer to join our team in San Francisco. In this role, you will be responsible for deploying and configuring large-scale HPC clusters for AI workloads, troubleshooting issues, and mentoring junior engineers. The ideal candidate will have 5+ years of experience, a strong understanding of HPC/AI architecture, and a collaborative spirit. Join us at Lambda to help build the future of AI cloud infrastructure. #J-18808-Ljbffr Lambda Inc.
- A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...Website
- ...The Superintelligence Cloud, is a leader in AI cloud infrastructure... ...currently Tuesday. Engineering at Lambda is responsible... ...large-scale HPC clusters for AI workloads... ...install and configure operating systems, firmware, software... ...deployment teams on-site Provide clear and...WebsiteWork experience placementWork at officeLocal areaRemote workWork from homeFlexible hours
$202.5k - $247.5k
...ngrok is an all‑in‑one cloud networking platform... ...or running AI workloads in production... ...device fleets, and site‑to‑site connectivity... ...your time. About the Infra Platform Team The Infra... ...builds the systems ngrok engineers rely on to build, deploy, and operate ngrok itself. We...WebsitePermanent employmentFull timeWork at officeLocal areaRemote workHome officeFlexible hours- Neura Market is seeking an HPC Engineer to build and configure large-scale HPC clusters for AI workloads. This role requires working 4 days a week onsite in San Francisco/Bellevue, where you will collaborate closely with teams to troubleshoot and improve systems. The ideal...Suggested
$250k - $400k
...Founding Engineer Opportunity Location: San... ...Type: Full-time, on-site We are seeing a... ...to give inboxes to AI agents and be the sole... ...strong backend and infra instincts to help... ...authenticate, and operate in the real world.... ...Experience with cloud infrastructure, distributed...WebsiteFull timeWork experience placement- Senior Software Engineer, Infrastructure & Agents About Reacher... ...'re making a big bet on AI agents and what they can... ...main buckets of work: Infra Re-architect our jobs... ...architecture GCP/cloud infrastructure (GKE/KEDA... ...still ahead Location On-site in San Francisco. This role...WebsiteFlexible hours
- ...Senior Platform Engineer (Cloud Platform) San Francisco, CA... ...Amplitude is the leading AI analytics platform,... ...living our values. We operate from a place of humility... ...assisted development—think infra primitives that are... ...engineering, DevOps, or Site Reliability Engineering...WebsiteShift work
$110k - $120k
...partner for middle market companies, operating through three business lines: Audax... ...POSITION SUMMARY: The IT Operations Engineer serves as the sole on-site IT resource for Audax Group's San... .... ~ Familiarity with enterprise AI tools (e.g., Microsoft Copilot, ChatGPT...WebsiteContract workWork at officeLocal areaRemote workRelocationNight shift$160k - $300k
Hebbia, Inc. in San Francisco is seeking a Site Reliability Engineer to own critical production systems. You will be responsible for designing, building, and improving these systems while writing production-quality code. The ideal candidate has over 5 years of software...Website$140k - $150k
...As a Client Platform Engineer, you will connect IT and Engineering for AI tools and automation at... ...company. At Nextdoor, we operate in an AI-first environment... ...such as trainings, off-sites, volunteer days, and... ...patterns across Google Cloud and AWS, including the ability...WebsiteWork at officeLocal areaWork from home$150k - $300k
...Guillermo Rauch (Vercel CEO) The Role We are looking for a AI Cloud Infra Engineer to join our infrastructure team. This role will be... ...know how to design for reliability and scale with minimal operational overhead. You learn new technologies rapidly because you’re...- ...Head of Corporate Engineering, you will be responsible... ...engineering and operations globally. You will... ...and optimizing cloud infrastructure,... ...on-call support, Infra as Code, observability... ...-as-code, SRE (Site Reliability Engineering... ...is the data and AI company. More than...WebsiteWork experience placementRemote workWorldwide
$179k - $218k
...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture Crusoe... ...only vertically integrated AI infrastructure company built... ...data center construction, and cloud services. If you want to... ...generation facilities. For Site Operations: You are the "...WebsiteTemporary work- Phonely in San Francisco is seeking an experienced DevOps Engineer to join our engineering team and help build reliable cloud infrastructure for voice AI systems. This role is fully on-site and essential to our fast-paced business environment. The ideal candidate will have...Website
- ...in San Francisco is seeking senior platform engineers to build efficient infrastructure that supports both traditional and AI workloads. The ideal candidate has 3+ years... ...core architectures and mentor peers to improve operational resilience. This position requires in-office...Work at office3 days per week
$200k - $275k
...Senior Backend Engineer (Infra/Platform/SRE) Title of Role: Senior Backend Engineer (Infra... ...services industry with innovative AI creative tools tailored for artists and... ...infrastructure utilizing Kubernetes and cloud platforms such as AWS, GCP, and Azure....Work at office$230k - $325k
...the Team The Codex Cloud Apps team builds cloud-... ..., deployed, and operated. We own end-user experiences such as ChatGPT Sites, Code Review, and future... ...increasingly complex work to AI. Our team sits at... ...intersection of product, engineering, design, and research....Website- ...ultimately become the perception engine for a company’s physical... ...perimeter visibility, autonomous operations management, and “digital twinning... ...approaching world of physical AI and robotics. We are a small,... ...for long days, remote work sites, and hard, physical work. Desired...WebsiteLocal areaRemote work
- A fast-growing AI company in San Francisco is seeking a Senior/Staff Infrastructure Engineer to build and operate cloud infrastructure. This full-time, hybrid role focuses on GCP, Kubernetes, and infrastructure-as-code. You will be responsible for securing deployments...Full time
$156.86k - $191.72k
...Berkeley National Laboratory is hiring an HPC Scientific Support Engineer for the NERSC division, the U.S.... ...project work. Monitor emerging HPC and AI trends and identify opportunities for... ...Work modality: Work may be performed on‑site, hybrid, or full‑time telework. The...WebsiteFull timeWork at officeRemote work- ...Us Conversion is the AI-native marketing automation... ...San Francisco and includes engineers, designers, and operators from Airbnb, Palantir, Pinterest... ...our customers. The best infra work here is measured in... ...years of experience building cloud infrastructure, dev tooling...Website
- Crusoe Energy Systems LLC is seeking a Senior Engineering Manager to lead a talented team in revolutionizing our cloud infrastructure. You will drive the Insights & Actions... ...projects. Join us in building the future of AI infrastructure! #J-18808-Ljbffr Crusoe Energy Systems...
- Crusoe is seeking a Senior Engineering Manager to lead a team focused on enhancing cloud infrastructure. The role involves building systems that convert raw infrastructure... .... Join Crusoe and contribute to shaping the future of AI infrastructure. #J-18808-Ljbffr Epoch BiodesignFull time
$127k - $225k
...Waabi, founded by AI visionary Raquel Urtasun, is... ...visit: As a Software Engineer on our Labelling and Data... ...- Understanding of cloud job orchestration, monitoring... ...Experience working with infra as code (Terraform, CloudFormation... ...social events both on-site, off-site & virtually....WebsiteFull timeWork at officeWork from homeFlexible hours- ...startup that is building the AI backbone for the next generation... ...are hiring a Backend Software Engineer (ML Infrastructure) to help... ...distributed training pipelines, cloud-native infrastructure, and internal... ...). - Excited to work on-site in San Francisco with a fast-...Website
$148.5k - $223.9k
...Senior Member of Technical Staff (SMTS) - Site Reliability Engineer (Cloud Automation) Location: New York, NY... ...Salesforce Salesforce is the #1 AI CRM, where humans with agents drive... ...Platform Engineering team builds and operates the highly available, active-active...WebsiteWork experience placementShift work$120k - $160k
...capabilities to the world’s biggest AI Labs at industry-defining... ...Technical Writer to join our Operations team. In this role, you will... ..., vendor specifications, engineering drawings, and product data sheets... ..., and archiving — within the site DMS; maintain revision...WebsiteWork at officeLocal area- ...Bedrock, we’re moving AI out of the lab and into... ...improving safety on job sites. Backed by $350M in funding... ...and world‑class engineers to solve physical‑world... ...Engineering, Commercial, and Operations. You will partner... ...Do Manage execution of cloud platform, fleet technology...WebsiteWork at officeFlexible hours
- ...leading 3D generative AI company on a mission... ...seeking a Software Engineer Intern to join our Data Infra team and help... ...agent‑driven workflows Cloud infrastructure on AWS... ...Work On Design and operate CI/CD pipelines covering... ...— this role is on‑site / hybrid. Our...WebsiteInternshipWork at officeRemote workFlexible hours1 day per week
- ...Platform/Infrastructure engineer to help shape how... ...environment setup, operations, policy checks, on-... ...platform work with AI Respond to... ...spent in a platform, infra, or SRE role ~ Kubernetes... ...Have Google Cloud Platform experience... ...in-office or on-site at least four days...WebsitePermanent employmentWork at officeLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to HPC Operations Engineer — AI Cloud Infra (On-site 4d/wk). Be the first to apply!
- security operations center engineer San Francisco, CA
- production operations engineer San Francisco, CA
- remote operation drilling engineer San Francisco, CA
- network operations center engineer San Francisco, CA
- operations engineer intern San Francisco, CA
- operations quality engineer San Francisco, CA
- senior security operations engineer San Francisco, CA
- senior production engineer San Francisco, CA
- operations engineer San Francisco, CA
- data operations engineer San Francisco, CA

