Member of Technical Staff - Training Platform
$150k - $300kPrime-Intellect
Building Open Superintelligence Infrastructure Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infrastructure that lets anyone create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full RL post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups, and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. We recently raised $15M in funding (taking total funding to $20M), led by Founders Fund with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka Labs, Tesla, OpenAI), Tri Dao (Chief Scientist, Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Hugging Face), Emad Mostaque (Stability AI), and many others. Role Impact You'll help build our hosted training platform - the product that lets users launch LoRA and full fine-tuning runs on managed GPU clusters with a single API call or a few clicks. The role spans the developer-facing platform and the underlying Kubernetes-based training infrastructure that runs the jobs. Core Technical Responsibilities Hosted Training Infrastructure Design and operate Kubernetes-based training and inference orchestration across multi-cluster, multi-cloud GPU fleets Build and maintain Helm charts that compose trainers, inference servers, environment servers, and supporting services into reproducible "Training stacks" Develop the Python control-plane agents that watch pods, report run state to the platform, and keep clusters in sync Implement scheduling and autoscaling for heterogeneous hardware (H100/H200/B200) using KEDA, LeaderWorkerSet, taints/tolerations, and gang scheduling Run a tight GitOps workflow - every change ships through PRs, Helm values, and CI Build node-local model caches, checkpoint pipelines, and shared storage for fast cold starts Operate the observability stack (Prometheus, Grafana, Loki, DCGM) and make GPU cluster debugging fast Platform Development Build the developer-facing surfaces for hosted training: job submission, live run monitoring, logs, metrics, model/adapter management, comparisons Develop FastAPI backend services and REST APIs that bridge the platform to running clusters Build real-time monitoring and debugging tools (streaming logs, step-level metrics, failure analysis) Ship product UI in Next.js / React / TypeScript with shadcn, Tailwind, tRPC, and TanStack Query Research Bridge Interface with the RL trainer, inference servers, and environment servers running inside our clusters Productize new training capabilities (new model architectures, RL algorithms, modes) Technical Requirements We're looking for engineers who are fluent across three areas - you don't need to be the world's best at any one, but you should have real depth in all three and a clear point of view on how they connect. AI & GPU Landscape Strong working knowledge of the modern AI stack - open model families, finetuning techniques (LoRA, QLoRA, full FT, RLHF/RLAIF), inference engines (vLLM, SGLang, TensorRT-LLM) Familiarity with GPU hardware tradeoffs (H100 / H200 / B200, NVLink, interconnects, memory hierarchy) and what they mean for training and inference workloads Understanding of distributed training fundamentals (data/tensor/pipeline/expert parallelism, NCCL, multi-node scheduling) Aware of what's happening at the frontier - new models, training methods, infra patterns - and the ability to translate that into product decisions Kubernetes & Infrastructure Strong Kubernetes operations experience - Helm, CRDs, operators, KEDA, gang scheduling, GPU operator Comfortable debugging real production clusters (kubectl, pod lifecycle, node issues, networking) Cloud platform experience (GCP preferred - GCS, GKE, Cloud Run, Cloud Tasks) Infrastructure automation (Helm, Terraform, Ansible) and a GitOps mindset Observability: Prometheus, Grafana, Loki, OpenTelemetry, DCGM Linux fundamentals: networking, namespaces, performance tuning Programming & Platform Strong Python backend development (FastAPI, async, SQLAlchemy) Comfortable building Python control-plane agents that talk to Kubernetes APIs Modern frontend development (TypeScript, React/Next.js, Tailwind, shadcn) - enough to ship product surfaces end-to-end REST and tRPC API design Experience building developer tools, dashboards, and live-monitoring UIs What We Offer Cash compensation $150K–$300K with significant equity Flexible work arrangement (remote or San Francisco office) Full visa sponsorship and relocation support Professional development budget for courses and conferences Regular team off-sites and conference attendance Opportunity to shape the future of decentralized AI development Growth Opportunity You'll join a team of experienced engineers and researchers working on cutting‑edge problems in AI infrastructure. We believe in open development and encourage team members to contribute to the broader AI community through research and open‑source work. We value potential over perfection - if you're passionate about democratizing AI development and have experience in either platform or infrastructure development (ideally both), we want to talk to you. Ready to help shape the future of AI? Apply now and join us in our mission to make powerful AI models accessible to everyone. #J-18808-Ljbffr
$200k - $350k
...long-term success for both clients and candidates. Member of Technical Staff - Pre-Training Infrastructure Location: San Francisco, CA... ...company is developing a vertically integrated robotics platform that combines advanced machine learning, robotics infrastructure...PlatformTechnical trainingWork at officeVisa sponsorship- ...GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power synthetic data... ...learning pipelines at scale. Build high-performance inference platforms capable of serving and evaluating models across thousands of GPUs...PlatformTechnical trainingRelocation package
- ...need exceptional people to help us get there. The Opportunity Our Training Infrastructure team is building the distributed systems that... ...role focused on runtime/performance/reliability (not a general platform/SRE role). You’ll work on a small team with fast feedback loops...PlatformTechnical training
- ...role) Hands‑on experience with LLM evaluations and/or post‑training methods: How to design useful evals and use their results... ...features end‑to‑end What the job involves We are seeking a Member of Technical Staff, Evals & Post‑Training Product to help define how...Technical training
- ...Member of Technical Staff - Post‑Training Join to apply for the Member of Technical Staff - Post‑Training role at Reflection AI . Our Mission Reflection’s mission is to build open superintelligence and make it accessible to all. We’re developing open weight models for...Technical trainingFull timeRelocation package
- ...mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises... ...and New York but also embrace being remote-friendly! As a Member of Technical Staff, you will: Design and write high-performant and scalable...Technical trainingFull timeWork at officeRemote workFlexible hours
- ...challenges of biology. We’ve redesigned the foundation model training stack to turn the world’s raw scientific data (e.g.... ...design and the responsibility to defend. About the Role As a Member of Technical Staff, Pre-Training Science at Radical Numerics, you will work on...Technical trainingLocal area
- ...Pixeltable, Inc. is seeking a Member of Technical Staff based in San Francisco, CA. As a founding member of our engineering team, you will directly... ...the design and development of a revolutionary AI data platform. With over 5 years of experience in systems engineering...PlatformFlexible hours
$150k - $350k
...decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates... ...‑class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you...Platform- ...by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates... ...to gigawatt‑class AI datacenters. Gimlet Labs is seeking an Member of Staff focused on AI Research (Intern). As an AI Researcher (Intern)...PlatformInternship
$150k - $350k
...Mission Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build the core platform that schedules, routes, and operates AI workloads reliably at production scale. You will work on systems that coordinate execution...Platform$180k
...Member Of Technical Staff - Pre-Training Palo Alto, CA About XAI XAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence...Technical trainingTemporary work- ...decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates... ...‑class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you...Platform
- ...challenges of biology. We’ve redesigned the foundation model training stack to turn the world’s raw scientific data (e.g.... ...design and the responsibility to defend. About the Role As a Member of Technical Staff, Infrastructure & Training Systems at Radical Numerics, you...Technical trainingLocal area
$130k - $200k
...Job Title Member of Technical Staff Salary $130k-200k+ + Equity Company Description Khosla Ventures-backed AI startup automating architectural... ...and capable of navigating the technical challenges of deep platform integrations like Autodesk Revit. Committed to an in-...Platform- ...building the foundational infrastructure to train specialized AI agents. We turn real-... ...feel like one seamless system. As a Member of Technical Staff, Infrastructure / DevOps, you will own... ..., cloud infrastructure, orchestration platforms, or developer tooling. Are comfortable...Platform
- ...decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates... ...gigawatt-class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on compilers. In this role, you will work on the...Platform
- ...domain expertise with frontier technology. The Role Being a Member of Technical Staff at SketchPro means the problem in front of you will keep... ...read 2D drawings and 3D geometry the way an architect does Platform integrations: deep, real-world hooks into Autodesk Revit...PlatformWork at officeShift work
- ...Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation machine learning workloads ($80M series A). As AI systems continue to grow in complexity...Platform
- ...we are building the world’s most advanced digital asset platform for institutions to participate in crypto. Anchorage... ...will be the primary driver of the system architecture, technical direction and each team member’s technical skill development. At Anchorage Digital, we...Platform
$185k - $255k
...Member of Technical Staff - Reinforcement Learning Optimized deploys AI agents into the most critical... ...learning, you'll own RL and post-training: the reward models, training loops, and... ...training breakthroughs into the live agent platform. What you'll bring • Have a PhD or...Platform- ...Job Description As a Member of Technical Staff (Research) at Trajectory, you will design and build the post‑training stack that lets our customers’ models continually learn from... ...a research and product lab creating the platform for continual learning. AI is the most capable...Platform
- ...your work will define what cutting edge means. We're hiring Members of Technical Staff to design the evaluations that set the standard for how AI... ...in the world, and help drive the product direction of our platform. The bar for success is becoming a world expert in modern...Platform
- ...Overview the company is seeking an intrepid, polymathic Member of Technical Staff to take on one of the AI industry's most unique engineering... ...regulatory issues pertaining to the company. Build automated platforms for harvesting, prioritizing, and patenting the company's...Platform
- ...Member of Technical Staff, Applied Research About Us At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most... .... Hands‑on experience training, fine‑tuning, or evaluating machine...Platform
- ...What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience.... ...deploying machine learning capabilities in training and production Drive technology and... ...frameworks, GCP and Kubernetes and workflow management platforms #J-18808-Ljbffr...PlatformWork experience placement
- ...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical... ...development landscape with our data-centric platform designed to simplify and accelerate... ...spanning ingestion, transformation, training/fine-tuning, and inference? You will...PlatformFull timePart timeWork at officeWork from homeFlexible hours2 days per week
- ...Member of Technical Staff, Infrastructure Join us and help shape the future of AI by architecting next-generation knowledge systems. Join us... ...scaling core infrastructure that powers a high-volume data platform for AI applications. We are looking for team members who love...PlatformWork at office
- ...manual processes that are labor intensive and costly. Our platform is gaining traction with finance teams across industrials,... ...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff – AI/ML to design, build, and deploy AI-powered systems...PlatformFull timeFlexible hours
$10k
...phone systems trap callers in menus and scripts. Vapi is the platform for deploying voice agents that know your business and can... ...Y Combinator, and our earlier backers. Total raised: $72M Member of Technical Staff, Backend Why We’re Hiring This Role 1M+ developers and 2.7...PlatformLive inFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff - Training Platform. Be the first to apply!
- salesforce technical analyst San Francisco, CA
- desktop support analyst San Francisco, CA
- personal computer support technician San Francisco, CA
- technical support specialist San Francisco, CA
- support analyst San Francisco, CA
- customer support technician San Francisco, CA
- support technician San Francisco, CA
- application support technician San Francisco, CA
- technical solutions specialist San Francisco, CA
- help desk administrator San Francisco, CA

