Member of Technical Staff - Inference

Prime Intellect

Building Open Superintelligence Infrastructure

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.

We recently raised $15mm in funding (total of $20mm raised) led by Founders Fund, with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI) and many others.

Role Impact

This is a hybrid position spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on advancing our ability to evaluate and serve models trained with our RL Lab at scale. The two key areas are:

Building the infrastructure to serve LLMs efficiently at scale.
Optimization and integration of inference systems into our RL training stack.

Core Technical Responsibilities

LLM Serving

Multitenant LLM Serving: Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets.
GPUAware Scheduling: Design placement and scheduling algorithms for heterogeneous accelerators.
Resilience & Failover: Implement multiregion/zone failover and traffic shifting for resilience and cost control.
Autoscaling & Routing: Build autoscaling, routing, and load balancing to meet throughput/latency SLOs.
Model Distribution: Optimize model distribution and cold-start times across clusters.

Inference Optimization & Performance

Framework Development: Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRTLLM.
Parallelism and Configuration Tuning: Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance.
EndtoEnd Performance: Profile kernels, memory bandwidth and transport; apply techniques such as quantization and speculative decoding.
Perf Suites: Develop reproducible performance suites (latency, throughput, context length, batch size, precision).
RL Integration: Embed and optimize distributed inference within our RL stack.

Platform & Tooling

CI/CD: Establish CI/CD with artifact promotion, performance gates, and reproducible builds.
Observability: Build metrics, logs, tracing; structured incident response and SLO management.
Docs & Collaboration: Document architectures, playbooks, and API contracts; mentor and collaborate crossfunctionally.

Technical Requirements

Required Experience

Building ML Systems at Scale: 3+ years building and running largescale ML/LLM services with clear latency/availability SLOs.
Inference Backends: Handson with at least one of vLLM, SGLang, TensorRTLLM.
Distributed Serving Infra: Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo.
Inference Internals: Deep understanding of prefill vs. decode, KVcache behavior, batching, sampling, speculative decoding, parallelism strategies.
FullStack Debugging: Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents endtoend.

Infrastructure Skills

Python: Systems tooling and backend services.
PyTorch: LLM Inference engine development and integration, deployment readiness.
Cloud & Automation: AWS/GCP service experience, cloud deployment patterns.
Kubernetes: Running infrastructure at scale with containers on Kubernetes.
GPU & Networking: Architecture, CUDA runtime, NCCL, InfiniBand; GPUaware binpacking and scheduling across heterogeneous fleets.

Nice to Have

KernelLevel Optimization: Familiarity with CUDA/Triton kernel development; Nsight Systems/Compute profiling.
Systems Performance Languages: Rust, C++ .
Data & Observability: Kafka/PubSub, Redis, gRPC/Protobuf; Prometheus/Grafana, OpenTelemetry; reliability patterns.
Infra & Config Automation : Terraform/Ansible, infrastructure-as-code, reproducible environments
Open Source: Contributions to serving, inference, or RL infrastructure projects.

What We Offer

Cash Compensation Range of $150-300kwith significant equity incentives
Flexible work arrangement (remote or San Francisco office)
Full visa sponsorship and relocation support
Professional development budget
Regular team off-sites and conference attendance
Opportunity to shape decentralized AI and RL at Prime Intellect

Growth Opportunity

You'll join a team of experienced engineers and researchers working on cutting-edge problems in AI infrastructure. We believe in open development and encourage team members to contribute to the broader AI community through research and open-source contributions.

We value potential over perfection. If you're passionate about democratizing AI development, we want to talk to you.

Ready to help shape the future of AI? Apply now and join us in our mission to make powerful AI models accessible to everyone.

Apply

Vacancy posted 19 hours ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff - Inference in United States vacancy

Member of Technical Staff - Inference
$180k
...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence...
Suggested
Temporary work
Xai
Palo Alto, CA
2 days ago
Member of Technical Staff - Inference
$180k
...a result, all engineers and researchers share the title "Member of Technical Staff." We operate with a flat organizational structure. All employees... .... Focus Optimizing the latency and throughput of model inference. Building reliable production serving systems to serve...
Suggested
Relocation
Pantera Capital
Palo Alto, CA
4 days ago
Member of Technical Staff - Inference
RadixArk is seeking a Member of Technical Staff — Inference to push the limits of large-scale AI inference. You will work on the core systems that serve frontier models at scale, optimizing performance, latency, throughput, and cost across thousands of GPUs. This role...
Suggested
Worldwide
Flexible hours
RadixArk
Palo Alto, CA
16 hours ago
Member of Technical Staff - ML Systems & Inference
...production workloads built to scale to gigawatt-class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under...
Suggested
Gimlet Labs
San Francisco, CA
2 days ago
Member of Technical Staff, ML Infrastructure & Inference
Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation machine learning workloads ($80M series A). As AI systems continue to grow in complexity...
Suggested
Acceler8 Talent
San Francisco, CA
3 days ago
Member of Technical Staff, Inference (Paris, London)
...Job Title What You'll Do Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops in robotics Design and optimize distributed inference systems on GPU clusters, pushing throughput with large...
Remote work
GenesisAI
United States
21 hours ago
Member of Technical Staff, LLM Inference - MAI Superintelligence Team
$139.9k - $274.8k
...Overview Our Inference team is responsible for building and maintaining the tools and systems that enable Microsoft AI researchers... ...Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding...
Ongoing contract
Work experience placement
Work at office
Local area
Microsoft Corporation
New York, NY
4 days ago
Member of Technical Staff, Inference & Serving
...engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in production. Your work will make inference faster, more cost-effective, and more reliable. Key Responsibilities Build and optimize high-performance model serving...
Immediate start
Flexible hours
Inception LLC
San Mateo, CA
4 days ago
Lead Member of Technical Staff, Inference Infrastructure
...Lead Member of Technical Staff Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG,...
Full time
Work at office
Remote work
Flexible hours
Cohere
United States
19 hours ago
Member of Technical Staff - Edge Inference Engineer
...people to help us get there. The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine... ...efficient on-device AI possible. You will work directly with the technical lead on problems that require deep understanding of both ML...
Remote work
Liquid AI, Inc
United States
5 days ago
Member of Technical Staff (Open Role)
...Technical Staff Position Adaptive ML is a frontier AI startup building a Reinforcement Learning... ...like a fit, please apply! As a Member of Technical Staff, you will contribute... ...hundreds of GPUs; Profile and iterate GPU inference kernels in Triton or CUDA, identifying...
Work at office
Remote work
Relocation
Visa sponsorship
Adaptive ML
United States
1 day ago
Member of Technical Staff, MLE
...a typical "Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: # Work directly with enterprise... ...learn) large-scale datasets and distributed training or inference pipelines. Understanding of LLM architectures, tuning techniques...
Full time
Work at office
Remote work
Flexible hours
Cohere
New York, NY
21 hours ago
Member of Technical Staff - Supercomputing
...About the Role RadixArk is hiring a Member of Technical Staff - Supercomputing to help build, deploy, and operate production-grade AI infrastructure for frontier-scale inference and training workloads. This role sits at the intersection of engineering, deployment...
Flexible hours
RadixArk
Palo Alto, CA
3 days ago
Member of Technical Staff - Multimodal Understanding
$180k
...Member of Technical Staff - Multimodal Understanding Palo Alto, CA About xAI xAI's mission is to create AI systems that can accurately... ...systems for multimodal pre-training, post-training, inference, data processing, and tokenization at web/petabyte scale....
Temporary work
Xai
Palo Alto, CA
3 days ago
Member of Technical Staff, Search
...Member Of Technical Staff, Search We are looking for talented individuals to help us develop state-of-the-art models for information retrieval... ...Work closely with the model serving team to ensure that inference is fast and stable. Collaborate with product teams to...
Full time
Work at office
Remote work
Flexible hours
Cohere
United States
1 day ago
Member of Technical Staff, Evals
$200k
...pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the Role Evals... ...many of the company's most important decisions. As a Member of Technical Staff on Evals, you will build both the platform and the...
Visa sponsorship
Relocation package
Magic Inc
San Francisco, CA
9 hours ago
Member of Technical Staff, Capacity & Efficiency Infrastructure - MAI Superintelligence Team
$119.8k - $234.7k
...Overview Microsoft AI is looking for a Member of Technical Staff - Capacity & Efficiency Infrastructure , to help us improve manage, and... ...will span model architecture, data curation, training and inference infrastructure, evaluation protocols, alignment and reinforcement...
Ongoing contract
Work at office
Local area
Microsoft Corporation
Mountain View, CA
2 days ago
Member of Technical Staff - TPU Systems (JAX / XLA / PALLAS)
$180k - $250k
Member of Technical Staff -- TPU Systems (JAX / XLA / PALLAS) About the Role RadixArk is looking for a TPU Systems Engineer to build high-performance inference and training systems using JAX, XLA, and Pallas. You'll push large-model workloads to their limits on TPU hardware...
Full time
Flexible hours
RadixArk
Palo Alto, CA
16 hours ago
Member of the Technical Staff- LLMs
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation: $170,000 - $220,000 base + 1-3% equity... ...join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the...
Full time
Temporary work
Immediate start
Visa sponsorship
Work visa
Amadeus Search
San Francisco, CA
16 hours ago
Member of Technical Staff
Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical Staff As a founding member of the engineering... ...ingestion, transformation, training/fine-tuning, and inference? You will also: Find opportunities to go deep into a wide...
Full time
Part time
Work at office
Work from home
Flexible hours
2 days per week
Pixeltable, Inc.
San Francisco, CA
2 days ago
Member of Technical Staff - RL Infrastructure
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing AI capable of open-ended learning. We are... ...at scale: distributed rollouts, training orchestration, inference, evals, data pipelines, observability, and reliability. You...
Work at office
Local area
Vmax
San Francisco, CA
16 hours ago
Member of Technical Staff
What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience. We are looking for an exceptional builder... ...scalability of output Design and build the integration of ML inference, monitoring systems, LLM interactions, application layers,...
Work experience placement
RST Recruitment
San Francisco, CA
3 days ago
Member of Technical Staff
Job Description We’re looking for a Member of Technical Staff to build and deploy production-grade AI systems. In this role, you’ll work across... ...Engineering: Design scalable pipelines for training, inference, and data processing Performance Optimization: Improve latency...
ERAGON
San Francisco, CA
3 days ago
Member of Technical Staff
...pointing ours at the frontier of science. Role Overview As a Member of Technical Staff you will shape Conductor's core offerings: AI software... ...Build back‑end services for data collection, labelling, and inference. Integrate with external systems for secure, reliable...
Conductor Quantum, Inc.
San Francisco, CA
1 day ago
Member of Technical Staff, Performance Optimization
$175k - $220k
...Member of Technical Staff, Performance Optimization San Mateo, CA About Us At Fireworks, we're building the future of generative AI... ...highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the...
Fireworks AI
San Mateo, CA
3 days ago
Member of Technical Staff Backend
$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive Equity Type: Full-Time Visa... ...millions of transactions on AWS, including: - Distributed inference - Caching - Queue orchestration - Self-healing...
Full time
Temporary work
H1b
Work at office
Visa sponsorship
Relocation package
Fuku
San Francisco, CA
3 days ago
Sr. Member of Technical Staff
$230k
...Sr. Member of Technical Staff Sunnyvale, CA Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer... ...allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly...
Remote work
CEREBRAS SYSTEMS INC.
United States
3 days ago
Member of Technical Staff, Research
$175k - $240k
...Member of Technical Staff, Research San Mateo, CA About Us At Fireworks, we're building the future of generative AI infrastructure.... ...highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the...
Work experience placement
Internship
Fireworks AI
San Mateo, CA
16 hours ago
Member of the Technical Staff - AI/ML
...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems... ...the needle Build robust AI pipelines from ingestion to inference — reliable, maintainable, and cost-efficient through smart...
Full time
Flexible hours
Stuut
New York, NY
16 hours ago
Member of Technical Staff - AI Research
...AI datacenters. Mission Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As an AI Researcher, you will be evaluating... ...new model architectures and experimenting with novel inference efficiency techniques such as KV caching and...
Gimlet Labs
San Francisco, CA
16 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Inference. Be the first to apply!