Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Infrastructure Engineer: GPU Training & Serving

Character

Join a dynamic team as a Machine Learning Infrastructure Engineer at Character.AI, where you'll enhance infrastructure for machine learning endeavors. This role requires substantial experience and expertise in cloud platforms, GPU management, and diagnostic tooling. Contribute to pioneering AI technology in an award-winning company recognized for its innovative approach to interactive entertainment. #J-18808-Ljbffr

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the ML Infrastructure Engineer: GPU Training & Serving in San Francisco, CA vacancy
  • $200k - $280k

     ...Engineering San Francisco Full-time $200,000 - $280,000 About the Role Join our ML Infrastructure team to build the systems that train, deploy, and serve our AI models at scale. You'll work at the intersection...  ...observability for ML systems Optimize GPU utilization and reduce... 
    Training
    Full time
    Work at office

    Lattice

    San Francisco, CA
    2 days ago
  • $300k - $430k

     ...team. About the Team The ML Infrastructure team builds the...  ...the platforms for model training, the infrastructure for...  ...models train efficiently, serve reliably, and deliver...  ...ML Infrastructure Engineer to own the platforms powering...  ...training: multi-node GPU clusters, fault... 
    Training
    Work at office

    Decagon

    San Francisco, CA
    2 days ago
  • $180k - $250k

     ...developing the context engine layer that solves a...  ...come from better infrastructure around models: Better...  ...PhD in Robotics and ML. Clark Zhang, CTO:...  ...infrastructure, such as training pipelines, inference/serving systems, data...  ...data processing, or GPU optimization. Exposure... 
    Training
    Full time

    Graphon.AI

    San Francisco, CA
    3 days ago
  •  ...What You’ll Do Training Automation: Design and...  .... Evaluation Infrastructure: Build scalable evaluation...  ..., and error rates GPU utilization and...  ...Computer Science, Engineering, or equivalent...  ...Engineering, MLOps, or ML Infrastructure...  ...Familiarity with LLM serving stacks such as... 
    Training
    Immediate start
    Relocation package
    Night shift

    AGI

    San Francisco, CA
    2 days ago
  •  ...we offer an innovative GPU marketplace and AI...  ...We're seeking a Senior Infrastructure Engineer to help build and scale...  ...orchestrated pool that serves thousands of AI developers...  ...infrastructure for AI/ML workloads, including...  ...file systems for training data and checkpoints Proficiency... 
    Training
    Remote work

    deCircle

    San Francisco, CA
    3 days ago
  •  ...Senior HPC & GPU Infrastructure Engineer Sciforium is an AI infrastructure company...  ..., high-efficiency serving platform. Backed by multi-million...  ...bring-up to maintaining the ML software stack (CUDA/ROCm,...  ...frameworks, and distributed training runtimes (e.g., vLLM... 
    Training
    Flexible hours

    Sciforium

    San Francisco, CA
    3 days ago
  •  ...Responsibility The AI Infrastructure team at Zensors builds the engine that powers our...  ...Learning Engineer in ML Runtime &...  ...to accelerate the training and inference of computer...  ...Deep understanding of GPU hardware performance...  ...cloud‑scale inference serving (e.g., Triton... 
    Training
    Work at office

    Zensors

    San Francisco, CA
    2 days ago
  •  ...consumer AI investments is hiring an ML Infrastructure Engineer. The founding team helped build...  ...ML Infra hire helping scale training and inference systems that directly...  ...for training and inference (GPU compute, orchestration, model serving) Own core systems for data and... 
    Training
    Full time

    Greylock Partners

    San Francisco, CA
    2 days ago
  •  ...Machine Learning Infrastructure Engineer Join to apply for the Machine Learning...  ...We’re looking for seasoned ML Infrastructure engineers...  ..., building and maintaining training and serving infrastructure for ML...  ...support our research Maximize GPU allocation and utilization... 
    Training
    Full time
    Internship

    Character

    San Francisco, CA
    2 days ago
  • $250k - $380k

     ...running OpenAI’s LLM training and inference infrastructure that powers frontier...  ...researchers train and serve models, abstracting away...  ...across vast GPU/accelerator fleets. By...  ...We are looking for an engineer to design and implement...  ...glamorous) part of the ML stack. Bonus points if... 
    Training
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    2 days ago
  • $245k - $345k

     ...updates on our news and engineering blogs and join us as we...  ...the future of AI and ML at Whatnot. You’ll design and scale the core infrastructure that powers machine learning...  ...‑latency, large model serving to distributed training & high‑throughput GPU inference. What you'll... 
    Training
    Work experience placement
    Work at office
    Local area
    Remote work
    Work from home
    Home office
    Flexible hours

    Whatnot

    San Francisco, CA
    14 days ago
  • $179k - $248k

     ...Machine Learning Infrastructure Engineer Join to apply for the Machine Learning...  ...clusters for AI model inference and training Develop, optimize, and maintain ML model serving and training infrastructure,...  ...‑heavy workflows and enhance GPU utilization for ML workloads.... 
    Training
    Hourly pay
    Full time
    Flexible hours

    Abridge

    San Francisco, CA
    2 days ago
  •  ...We are seeking a Data Infrastructure Engineer to build and operate the infrastructure...  ..., storage, compute, and serving, with a strong emphasis on...  ...and operate scalable data and ML infrastructure on AWS,...  ...to support perception model training and evaluation workflows, enabling... 
    Training
    Permanent employment
    Full time

    Matter Intelligence

    San Francisco, CA
    4 days ago
  • $293k

     ...OpenAI is seeking a Software Engineer for Monetization ML Infrastructure in San Francisco. This role involves designing the machine learning...  ...'ll work on large-scale data pipelines, model training platforms, and real-time serving infrastructure, all while ensuring strict... 
    Training

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...Build the data infrastructure for robots operating in the real...  ...or need to be improved, engineers rely on data to understand...  ...We're looking for a ML Platform Engineer with deep...  ...itself, from inference serving and pipeline orchestration to training infrastructure and evaluation... 
    Training
    Remote work

    Foxglove Technologies, Inc

    San Francisco, CA
    1 day ago
  • $166k - $225k

     ...machine learning engineers and researchers, Mosaic...  ...fine‑tune, train and deploy custom...  ...platform for the ML development lifecycle...  ...training, evaluation, serving, and agent‑...  ...features to low‑level GPU orchestration. You...  ...the core platform infrastructure that supports our... 
    Training
    Local area
    Worldwide

    Cacheflow

    San Francisco, CA
    2 days ago
  • $250k - $300k

     ...team, every quarter. Our engineering roles are hybrid in...  ...ll Own: Eval & Release Infrastructure — Automated graders...  ...usage and convert it into training signal. End‑to‑end...  ...customization at scale. Model Serving — Performance and...  ..., 3+ focused on ML infrastructure, platform... 
    Training
    Work at office
    Immediate start

    Ambience

    San Francisco, CA
    2 days ago
  • $250k - $350k

     ...what makes them actually work. We’re hiring ML Infrastructure Engineers to tackle a hard, real-world problem,...  ...pipelines handling millions of hours of data Training and inference systems for multimodal / LLM-based models GPU infrastructure and performance optimisation... 
    Training

    Trades Workforce Solutions

    San Francisco, CA
    2 days ago
  • $100k - $200k

     ...that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity 0.20% - 1....  ...model infrastructure end to end: Scaling GPU and compute infrastructure. Architect...  ..., ideally involving GPUs and model serving. You're a hardware nerd at heart. You... 
    Full time
    Live in
    Work at office

    Voiceflow

    San Francisco, CA
    2 days ago
  •  ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems...  ...better. Responsibilities Optimize distributed training & RL across our GPU cluster of hundreds of H100 GPUs (FSDP, DeepSpeed, or... 
    Training

    Spectral Labs

    San Francisco, CA
    5 days ago
  •  ...The problem we saw Most AI infrastructure is built for batch: send...  ...infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU compute platform from the...  ...from bare metal to model serving. You will work directly with... 
    Flexible hours
    Shift work

    U-Run

    San Francisco, CA
    2 days ago
  •  ...Quantum-Accelerated AI Server Engineer Sygaldry Technologies...  ...exponentially speed up training and inference for AI. By...  ...AI. They need compute infrastructure that stays out of their way: GPU access that's reliable, experiments...  ...) Python-based ML and scientific computing... 
    Training
    Casual work
    Local area
    Visa sponsorship

    Sygaldry

    San Francisco, CA
    7 hours ago
  •  ...through advanced hardware engineering and AI solutions. Our...  ...Machine Learning Infrastructure Engineer to join our team...  ..., production-grade ML ecosystem to support rapid...  ...scalable distributed training pipelines, with...  ...model sharding, cross-GPU communication, and real... 
    Training
    Flexible hours

    Echo Neurotechnologies

    San Francisco, CA
    3 days ago
  • $183.7k - $248.6k

     ...looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where...  ...the infrastructure that brings ML models from training into production, ensuring our ranking...  ...and maintain the infrastructure that serves ML models in real-time across Unity... 
    Training
    Work at office
    Remote work
    Worldwide
    Relocation package

    UNITY

    San Francisco, CA
    2 days ago
  •  ...focused on scaling and optimizing ML training systems. Key responsibilities include owning the training infrastructure, improving performance, and managing GPU/TPU compute resources. Ideal candidates will have strong software engineering foundations, hands-on experience... 
    Training

    Physical Intelligence

    San Francisco, CA
    1 day ago
  •  ...Mach9 ML Engineer Role At Mach9, ML Engineers build...  ...allows us to develop and train cutting edge 3D scene...  ...understanding models that serve real surveyors and...  ...partner closely with infrastructure and product teams to take...  ...with multi-GPU training and experiment... 
    Training

    Mach9

    San Francisco, CA
    7 hours ago
  •  ...About the Role ML Ops Engineer — Agentic AI Lab (Founding...  ...industry’s most critical infrastructure problems. Our AI Lab...  ...automating the model training, deployment,...  ...compute orchestration, GPU infrastructure, fine-...  ...Familiarity with inference serving: vLLM, TGI, Ray Serve... 
    Training
    Full time

    Fabrion

    San Francisco, CA
    6 days ago
  • $292k - $417.2k

     ...Director, ML Engineering & Infrastructure San Francisco, CA; Los Angeles, CA; New York, NY (Hybrid); USA - Remote About the Role...  ...pipelines: data ingestion, feature engineering, model training, evaluation, and serving. Architect distributed systems to support ML... 
    Training
    Full time
    Temporary work
    Local area
    Remote work
    Flexible hours

    Tubi

    San Francisco, CA
    9 days ago
  • A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate... 
    Training

    Reflection AI

    San Francisco, CA
    3 days ago
  •  ...new Machine Learning Engineer opportunities posted...  ...ingestion, preprocessing, training, testing, and...  ...optimize end-to-end ML pipelines encompassing...  ...current with the latest AI infrastructure, tooling, and...  ...latency behavior, and GPU and model-serving platforms for LLM inference... 
    Training
    Flexible hours

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Infrastructure Engineer: GPU Training & Serving. Be the first to apply!