Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff - Inference

$150k - $300k

Prime Intellect

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full RL post‑training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end‑to‑end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. Role Impact This is a hybrid position spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on advancing our ability to evaluate and serve models trained with our RL Lab at scale. The two key areas are: Building the infrastructure to serve LLMs efficiently at scale. Optimization and integration of inference systems into our RL training stack. Core Technical Responsibilities LLM Serving Multi‑tenant LLM Serving: Build a multi‑tenant LLM serving platform that operates across our cloud GPU fleets. GPU‑Aware Scheduling: Design placement and scheduling algorithms for heterogeneous accelerators. Resilience & Failover: Implement multi‑region/zone failover and traffic shifting for resilience and cost control. Autoscaling & Routing: Build autoscaling, routing, and load balancing to meet throughput/latency SLOs. Model Distribution: Optimize model distribution and cold‑start times across clusters. Inference Optimization & Performance Framework Development: Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM. Parallelism and Configuration Tuning: Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance. End‑to‑End Performance: Profile kernels, memory bandwidth and transport; apply techniques such as quantization and speculative decoding. Perf Suites: Develop reproducible performance suites (latency, throughput, context length, batch size, precision). RL Integration: Embed and optimize distributed inference within our RL stack. Platform & Tooling CI/CD: Establish CI/CD with artifact promotion, performance gates, and reproducible builds. Observability: Build metrics, logs, tracing; structured incident response and SLO management. Docs & Collaboration: Document architectures, playbooks, and API contracts; mentor and collaborate cross‑functionally. Technical Requirements Required Experience Building ML Systems at Scale: 3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs. Inference Backends: Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM. Distributed Serving Infra: Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo. Inference Internals: Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies. Full‑Stack Debugging: Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end. Infrastructure Skills Python: Systems tooling and backend services. PyTorch: LLM Inference engine development and integration, deployment readiness. Cloud & Automation: AWS/GCP service experience, cloud deployment patterns. Kubernetes: Running infrastructure at scale with containers on Kubernetes. GPU & Networking: Architecture, CUDA runtime, NCCL, InfiniBand; GPU‑aware bin‑packing and scheduling across heterogeneous fleets. Nice to Have Kernel‑Level Optimization: Familiarity with CUDA/Triton kernel development; Nsight Systems/Compute profiling. Systems Performance Languages: Rust, C++. Data & Observability: Kafka/PubSub, Redis, gRPC/Protobuf; Prometheus/Grafana, OpenTelemetry; reliability patterns. Infra & Config Automation: Terraform/Ansible, infrastructure‑as‑code, reproducible environments. Open Source: Contributions to serving, inference, or RL infrastructure projects. What we offer Cash Compensation Range of $150-300k with significant equity incentives. Flexible work arrangement (remote or San Francisco office). Full visa sponsorship and relocation support. Professional development budget. Regular team off‑sites and conference attendance. Opportunity to shape decentralized AI and RL at Prime Intellect. Growth Opportunity You'll join a team of experienced engineers and researchers working on cutting‑edge problems in AI infrastructure. We believe in open development and encourage team members to contribute to the broader AI community through research and open‑source contributions. We value potential over perfection. If you're passionate about democratizing AI development, we want to talk to you. #J-18808-Ljbffr

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - Inference in San Francisco, CA vacancy
  • $180k

     ...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence... 
    Suggested
    Temporary work

    Xai

    San Francisco, CA
    2 days ago
  • $200k - $350k

     ...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today... 
    Suggested
    Immediate start
    Flexible hours

    Inception LLC

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation machine learning workloads ($80M series A). As AI systems continue to grow in complexity... 
    Suggested

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  •  ...production workloads built to scale to gigawatt‑class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build inference systems that execute full models end‑to‑end under real... 
    Suggested

    Gimlet Labs, Inc.

    San Francisco, CA
    4 days ago
  •  ...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical Staff As a founding member of the engineering...  ...ingestion, transformation, training/fine-tuning, and inference? You will also: Find opportunities to go deep into a wide... 
    Suggested
    Full time
    Part time
    Work at office
    Work from home
    Flexible hours
    2 days per week

    Pixeltable, Inc.

    San Francisco, CA
    3 days ago
  • $200k

     ...pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role Evals...  ...of many of the company's most important decisions. As a Member of Technical Staff on Evals, you will build both the platform and the evaluations... 
    Visa sponsorship
    Relocation package

    Magic AI Corp.

    San Francisco, CA
    4 days ago
  •  ...Member Of Technical Staff We're looking for a member of technical staff to build and deploy production-grade AI systems. In this role, you...  ...world applications Design scalable pipelines for training, inference, and data processing Improve latency, throughput, cost... 

    ERAGON

    San Francisco, CA
    3 days ago
  •  ...exceptional people to help us get there. The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine...  ...on-device AI possible. You will work directly with the technical lead on problems that require deep understanding of both ML architectures... 

    Liquid AI

    San Francisco, CA
    1 day ago
  •  ...great products. Join us on our mission and shape the future! Member of Technical Staff, Search Why this role? We are looking for talented...  ...Work closely with the model serving team to ensure that inference is fast and stable. Collaborate with product teams to develop... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    3 days ago
  • $300k

    Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing AI capable of open-ended learning. We are...  ...at scale: distributed rollouts, training orchestration, inference, evals, data pipelines, observability, and reliability. You... 
    Work at office
    Local area

    Vmax

    San Francisco, CA
    5 days ago
  •  ...pointing ours at the frontier of science. Role Overview As a Member of Technical Staff you will shape Conductor's core offerings: AI software...  ...Build back‑end services for data collection, labelling, and inference. Integrate with external systems for secure, reliable... 

    Conductor Quantum

    San Francisco, CA
    1 day ago
  • $170k - $220k

    Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation: $170,000 - $220,000 base + 1-3% equity...  ...join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the... 
    Full time
    Temporary work
    Immediate start
    Visa sponsorship
    Work visa

    Amadeus Search

    San Francisco, CA
    5 days ago
  • What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience. We are looking for an exceptional builder...  ...scalability of output Design and build the integration of ML inference, monitoring systems, LLM interactions, application layers,... 
    Work experience placement

    RST Recruitment

    San Francisco, CA
    3 days ago
  •  ...boundaries of what's possible in robotic intelligence. As a Member of Technical Staff, you'll be at the forefront of developing breakthrough...  ...end‑to‑end vision‑language‑action models, efficient model inference, video tokenization Design and implement novel deep learning... 
    Local area

    Amazon Science

    San Francisco, CA
    5 days ago
  •  ...contributions to developer tools or AI/ML repositories (Desirable) Inference & Hardware Knowledge: Interest in the hardware side of AI—...  ...end‑to‑end What the job involves We are seeking a Member of Technical Staff, Evals & Post‑Training Product to help define how... 

    Fireworks AI

    San Francisco, CA
    4 days ago
  •  ...future of AI. About the role Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As an AI Researcher, you...  ...new model architectures and experimenting with novel inference efficiency techniques such as KV caching and FlashAttention... 

    Gimlet Labs

    San Francisco, CA
    5 days ago
  • $150k - $280k

     ...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive Equity Type: Full-Time Visa...  ...millions of transactions on AWS, including: - Distributed inference - Caching - Queue orchestration - Self-healing... 
    Full time
    Temporary work
    H1b
    Work at office
    Visa sponsorship
    Relocation package

    Fuku

    San Francisco, CA
    3 days ago
  • $150k - $300k

     ...infrastructure that runs the jobs. Core Technical Responsibilities Hosted Training...  ...operate Kubernetes-based training and inference orchestration across multi-cluster,...  ...in open development and encourage team members to contribute to the broader AI community... 
    Work at office
    Local area
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    1 day ago
  • $256k - $276k

     ...picture and our vision at Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and maintain the...  ...infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering... 
    Work at office
    Flexible hours
    3 days per week

    Postman

    San Francisco, CA
    2 days ago
  • Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration...  ...on (not limited to): Building and scaling training and inference infrastructure (potentially for various chips across... 

    Mirendil

    San Francisco, CA
    1 day ago
  • Member of Technical Staff - Post‑Training Join to apply for the Member of Technical Staff - Post‑Training role at Reflection AI . Our Mission...  ...pipelines, reward models, reinforcement learning algorithms, and inference‑time scaling techniques. Collaborate across pre‑training... 
    Full time
    Relocation package

    Reflection AI

    San Francisco, CA
    4 days ago
  • The opportunity We are looking for a Member of Technical Staff with deep expertise in generative modelling to work at the interface between our...  ...of generative model architectures, training dynamics and inference behaviour. You are a skilful ML developer. You write ML code... 
    Flexible hours

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    2 days ago
  •  ...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems...  ...: Develop robust AI pipelines from data ingestion through inference, ensuring reliability, scalability, and maintainability.... 
    Full time
    Flexible hours

    Stuut

    San Francisco, CA
    9 days ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join... 

    Perplexity AI

    San Francisco, CA
    6 days ago
  • Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations...  ...pipelines that support data ingestion, model training, inference, and monitoring while ensuring high availability and... 
    Full time
    Flexible hours

    Andiamo

    San Francisco, CA
    5 days ago
  • Member of Technical Staff - Agents at Prime Intellect - San Francisco Building the Future of Open Source + Decentralized AI At Prime Intellect...  ...Infrastructure : Understanding how to optimize agent training or inference on GPUs. Advanced AI/ML Knowledge : Familiarity with... 
    Remote work
    Flexible hours

    Victrays

    San Francisco, CA
    1 day ago
  • Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform how businesses operate. A fast-growing...  ...latency pipelines that support audio ingestion, streaming inference, orchestration, and monitoring, ensuring consistent performance... 
    Full time
    Flexible hours

    Andiamo

    San Francisco, CA
    3 days ago
  •  ...frontier of interactive AI. The Role We’re looking for a Member of Technical Staff — Diffusion Models to help design and train the next generation...  ...AI familiarity Interactive generation systems Real-time inference optimization Graphics or game-engine experience... 

    Moonlake

    San Francisco, CA
    4 days ago
  •  ...Moonlake is hiring a Member of Technical Staff — Diffusion Models to design and train advanced multimodal generative systems. This role focuses on developing diffusion architectures and large-scale training processes to enhance interactive world generation. The ideal candidate... 

    Moon Lake

    San Francisco, CA
    3 days ago
  •  ...AI frontier — you won't just observe the cutting edge of AI, your work will define what cutting edge means. We're hiring Members of Technical Staff to design the evaluations that set the standard for how AI is measured, produce analysis that shapes how companies and the... 

    Artificial Analysis, Inc.

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Inference. Be the first to apply!