ML Infrastructure Engineer: GPU Training & Serving
Character
Join a dynamic team as a Machine Learning Infrastructure Engineer at Character.AI, where you'll enhance infrastructure for machine learning endeavors. This role requires substantial experience and expertise in cloud platforms, GPU management, and diagnostic tooling. Contribute to pioneering AI technology in an award-winning company recognized for its innovative approach to interactive entertainment. #J-18808-Ljbffr
$200k - $280k
...Engineering San Francisco Full-time $200,000 - $280,000 About the Role Join our ML Infrastructure team to build the systems that train, deploy, and serve our AI models at scale. You'll work at the intersection... ...observability for ML systems Optimize GPU utilization and reduce...TrainingFull timeWork at office$300k - $430k
...team. About the Team The ML Infrastructure team builds the... ...the platforms for model training, the infrastructure for... ...models train efficiently, serve reliably, and deliver... ...ML Infrastructure Engineer to own the platforms powering... ...training: multi-node GPU clusters, fault...TrainingWork at office$180k - $250k
...developing the context engine layer that solves a... ...come from better infrastructure around models: Better... ...PhD in Robotics and ML. Clark Zhang, CTO:... ...infrastructure, such as training pipelines, inference/serving systems, data... ...data processing, or GPU optimization. Exposure...TrainingFull time- ...What You’ll Do Training Automation: Design and... .... Evaluation Infrastructure: Build scalable evaluation... ..., and error rates GPU utilization and... ...Computer Science, Engineering, or equivalent... ...Engineering, MLOps, or ML Infrastructure... ...Familiarity with LLM serving stacks such as...TrainingImmediate startRelocation packageNight shift
- ...we offer an innovative GPU marketplace and AI... ...We're seeking a Senior Infrastructure Engineer to help build and scale... ...orchestrated pool that serves thousands of AI developers... ...infrastructure for AI/ML workloads, including... ...file systems for training data and checkpoints Proficiency...TrainingRemote work
- ...Senior HPC & GPU Infrastructure Engineer Sciforium is an AI infrastructure company... ..., high-efficiency serving platform. Backed by multi-million... ...bring-up to maintaining the ML software stack (CUDA/ROCm,... ...frameworks, and distributed training runtimes (e.g., vLLM...TrainingFlexible hours
- ...Responsibility The AI Infrastructure team at Zensors builds the engine that powers our... ...Learning Engineer in ML Runtime &... ...to accelerate the training and inference of computer... ...Deep understanding of GPU hardware performance... ...cloud‑scale inference serving (e.g., Triton...TrainingWork at office
- ...consumer AI investments is hiring an ML Infrastructure Engineer. The founding team helped build... ...ML Infra hire helping scale training and inference systems that directly... ...for training and inference (GPU compute, orchestration, model serving) Own core systems for data and...TrainingFull time
- ...Machine Learning Infrastructure Engineer Join to apply for the Machine Learning... ...We’re looking for seasoned ML Infrastructure engineers... ..., building and maintaining training and serving infrastructure for ML... ...support our research Maximize GPU allocation and utilization...TrainingFull timeInternship
$250k - $380k
...running OpenAI’s LLM training and inference infrastructure that powers frontier... ...researchers train and serve models, abstracting away... ...across vast GPU/accelerator fleets. By... ...We are looking for an engineer to design and implement... ...glamorous) part of the ML stack. Bonus points if...TrainingFull timeWork at officeLocal areaRelocation packageFlexible hours$245k - $345k
...updates on our news and engineering blogs and join us as we... ...the future of AI and ML at Whatnot. You’ll design and scale the core infrastructure that powers machine learning... ...‑latency, large model serving to distributed training & high‑throughput GPU inference. What you'll...TrainingWork experience placementWork at officeLocal areaRemote workWork from homeHome officeFlexible hours$179k - $248k
...Machine Learning Infrastructure Engineer Join to apply for the Machine Learning... ...clusters for AI model inference and training Develop, optimize, and maintain ML model serving and training infrastructure,... ...‑heavy workflows and enhance GPU utilization for ML workloads....TrainingHourly payFull timeFlexible hours- ...We are seeking a Data Infrastructure Engineer to build and operate the infrastructure... ..., storage, compute, and serving, with a strong emphasis on... ...and operate scalable data and ML infrastructure on AWS,... ...to support perception model training and evaluation workflows, enabling...TrainingPermanent employmentFull time
$293k
...OpenAI is seeking a Software Engineer for Monetization ML Infrastructure in San Francisco. This role involves designing the machine learning... ...'ll work on large-scale data pipelines, model training platforms, and real-time serving infrastructure, all while ensuring strict...Training- ...Build the data infrastructure for robots operating in the real... ...or need to be improved, engineers rely on data to understand... ...We're looking for a ML Platform Engineer with deep... ...itself, from inference serving and pipeline orchestration to training infrastructure and evaluation...TrainingRemote work
$166k - $225k
...machine learning engineers and researchers, Mosaic... ...fine‑tune, train and deploy custom... ...platform for the ML development lifecycle... ...training, evaluation, serving, and agent‑... ...features to low‑level GPU orchestration. You... ...the core platform infrastructure that supports our...TrainingLocal areaWorldwide$250k - $300k
...team, every quarter. Our engineering roles are hybrid in... ...ll Own: Eval & Release Infrastructure — Automated graders... ...usage and convert it into training signal. End‑to‑end... ...customization at scale. Model Serving — Performance and... ..., 3+ focused on ML infrastructure, platform...TrainingWork at officeImmediate start$250k - $350k
...what makes them actually work. We’re hiring ML Infrastructure Engineers to tackle a hard, real-world problem,... ...pipelines handling millions of hours of data Training and inference systems for multimodal / LLM-based models GPU infrastructure and performance optimisation...Training$100k - $200k
...that scales voice and chat AI agents ML‑Infrastructure Engineer Salary $100K - $200K Equity 0.20% - 1.... ...model infrastructure end to end: Scaling GPU and compute infrastructure. Architect... ..., ideally involving GPUs and model serving. You're a hardware nerd at heart. You...Full timeLive inWork at office- ...ML Infrastructure Engineer Spectral Labs is a spatial intelligence company building reasoning models for engineering physical systems... ...better. Responsibilities Optimize distributed training & RL across our GPU cluster of hundreds of H100 GPUs (FSDP, DeepSpeed, or...Training
- ...The problem we saw Most AI infrastructure is built for batch: send... ...infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU compute platform from the... ...from bare metal to model serving. You will work directly with...Flexible hoursShift work
- ...Quantum-Accelerated AI Server Engineer Sygaldry Technologies... ...exponentially speed up training and inference for AI. By... ...AI. They need compute infrastructure that stays out of their way: GPU access that's reliable, experiments... ...) Python-based ML and scientific computing...TrainingCasual workLocal areaVisa sponsorship
- ...through advanced hardware engineering and AI solutions. Our... ...Machine Learning Infrastructure Engineer to join our team... ..., production-grade ML ecosystem to support rapid... ...scalable distributed training pipelines, with... ...model sharding, cross-GPU communication, and real...TrainingFlexible hours
$183.7k - $248.6k
...looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where... ...the infrastructure that brings ML models from training into production, ensuring our ranking... ...and maintain the infrastructure that serves ML models in real-time across Unity...TrainingWork at officeRemote workWorldwideRelocation package- ...focused on scaling and optimizing ML training systems. Key responsibilities include owning the training infrastructure, improving performance, and managing GPU/TPU compute resources. Ideal candidates will have strong software engineering foundations, hands-on experience...Training
- ...Mach9 ML Engineer Role At Mach9, ML Engineers build... ...allows us to develop and train cutting edge 3D scene... ...understanding models that serve real surveyors and... ...partner closely with infrastructure and product teams to take... ...with multi-GPU training and experiment...Training
- ...About the Role ML Ops Engineer — Agentic AI Lab (Founding... ...industry’s most critical infrastructure problems. Our AI Lab... ...automating the model training, deployment,... ...compute orchestration, GPU infrastructure, fine-... ...Familiarity with inference serving: vLLM, TGI, Ray Serve...TrainingFull time
$292k - $417.2k
...Director, ML Engineering & Infrastructure San Francisco, CA; Los Angeles, CA; New York, NY (Hybrid); USA - Remote About the Role... ...pipelines: data ingestion, feature engineering, model training, evaluation, and serving. Architect distributed systems to support ML...TrainingFull timeTemporary workLocal areaRemote workFlexible hours- A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate...Training
- ...new Machine Learning Engineer opportunities posted... ...ingestion, preprocessing, training, testing, and... ...optimize end-to-end ML pipelines encompassing... ...current with the latest AI infrastructure, tooling, and... ...latency behavior, and GPU and model-serving platforms for LLM inference...TrainingFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to ML Infrastructure Engineer: GPU Training & Serving. Be the first to apply!
- machine learning software engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- graduate machine learning engineer San Francisco, CA
- computer vision machine learning engineer San Francisco, CA
- machine learning engineer San Francisco, CA
- senior ml engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- data scientist machine learning engineer San Francisco, CA
- security infrastructure engineer San Francisco, CA

