Member of Technical Staff - Inference

$150k - $300k

Prime Intellect

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full RL post‑training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end‑to‑end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. Role Impact This is a hybrid position spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on advancing our ability to evaluate and serve models trained with our RL Lab at scale. The two key areas are: Building the infrastructure to serve LLMs efficiently at scale. Optimization and integration of inference systems into our RL training stack. Core Technical Responsibilities LLM Serving Multi‑tenant LLM Serving: Build a multi‑tenant LLM serving platform that operates across our cloud GPU fleets. GPU‑Aware Scheduling: Design placement and scheduling algorithms for heterogeneous accelerators. Resilience & Failover: Implement multi‑region/zone failover and traffic shifting for resilience and cost control. Autoscaling & Routing: Build autoscaling, routing, and load balancing to meet throughput/latency SLOs. Model Distribution: Optimize model distribution and cold‑start times across clusters. Inference Optimization & Performance Framework Development: Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM. Parallelism and Configuration Tuning: Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance. End‑to‑End Performance: Profile kernels, memory bandwidth and transport; apply techniques such as quantization and speculative decoding. Perf Suites: Develop reproducible performance suites (latency, throughput, context length, batch size, precision). RL Integration: Embed and optimize distributed inference within our RL stack. Platform & Tooling CI/CD: Establish CI/CD with artifact promotion, performance gates, and reproducible builds. Observability: Build metrics, logs, tracing; structured incident response and SLO management. Docs & Collaboration: Document architectures, playbooks, and API contracts; mentor and collaborate cross‑functionally. Technical Requirements Required Experience Building ML Systems at Scale: 3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs. Inference Backends: Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM. Distributed Serving Infra: Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo. Inference Internals: Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies. Full‑Stack Debugging: Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end. Infrastructure Skills Python: Systems tooling and backend services. PyTorch: LLM Inference engine development and integration, deployment readiness. Cloud & Automation: AWS/GCP service experience, cloud deployment patterns. Kubernetes: Running infrastructure at scale with containers on Kubernetes. GPU & Networking: Architecture, CUDA runtime, NCCL, InfiniBand; GPU‑aware bin‑packing and scheduling across heterogeneous fleets. Nice to Have Kernel‑Level Optimization: Familiarity with CUDA/Triton kernel development; Nsight Systems/Compute profiling. Systems Performance Languages: Rust, C++. Data & Observability: Kafka/PubSub, Redis, gRPC/Protobuf; Prometheus/Grafana, OpenTelemetry; reliability patterns. Infra & Config Automation: Terraform/Ansible, infrastructure‑as‑code, reproducible environments. Open Source: Contributions to serving, inference, or RL infrastructure projects. What we offer Cash Compensation Range of $150-300k with significant equity incentives. Flexible work arrangement (remote or San Francisco office). Full visa sponsorship and relocation support. Professional development budget. Regular team off‑sites and conference attendance. Opportunity to shape decentralized AI and RL at Prime Intellect. Growth Opportunity You'll join a team of experienced engineers and researchers working on cutting‑edge problems in AI infrastructure. We believe in open development and encourage team members to contribute to the broader AI community through research and open‑source contributions. We value potential over perfection. If you're passionate about democratizing AI development, we want to talk to you. #J-18808-Ljbffr

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff - Inference in San Francisco, CA vacancy

Member of Technical Staff - Inference
$180k
...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence...
Suggested
Temporary work
Xai
San Francisco, CA
2 days ago
Member of Technical Staff, Inference & Serving
$200k - $350k
...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today...
Suggested
Immediate start
Flexible hours
Inception LLC
San Francisco, CA
4 days ago
Member of Technical Staff, ML Infrastructure & Inference
Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation machine learning workloads ($80M series A). As AI systems continue to grow in complexity...
Suggested
Acceler8 Talent
San Francisco, CA
3 days ago
Member of Technical Staff - ML Systems & Inference
...production workloads built to scale to gigawatt‑class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build inference systems that execute full models end‑to‑end under real...
Suggested
Gimlet Labs, Inc.
San Francisco, CA
4 days ago
Member of Technical Staff
...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical Staff As a founding member of the engineering... ...ingestion, transformation, training/fine-tuning, and inference? You will also: Find opportunities to go deep into a wide...
Suggested
Full time
Part time
Work at office
Work from home
Flexible hours
2 days per week
Pixeltable, Inc.
San Francisco, CA
3 days ago
Member of Technical Staff, Evals
$200k
...pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role Evals... ...of many of the company's most important decisions. As a Member of Technical Staff on Evals, you will build both the platform and the evaluations...
Visa sponsorship
Relocation package
Magic AI Corp.
San Francisco, CA
4 days ago
Member of Technical Staff
...Member Of Technical Staff We're looking for a member of technical staff to build and deploy production-grade AI systems. In this role, you... ...world applications Design scalable pipelines for training, inference, and data processing Improve latency, throughput, cost...
ERAGON
San Francisco, CA
3 days ago
Member of Technical Staff - Edge Inference Engineer
...exceptional people to help us get there. The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine... ...on-device AI possible. You will work directly with the technical lead on problems that require deep understanding of both ML architectures...
Liquid AI
San Francisco, CA
1 day ago
Member of Technical Staff, Search
...great products. Join us on our mission and shape the future! Member of Technical Staff, Search Why this role? We are looking for talented... ...Work closely with the model serving team to ensure that inference is fast and stable. Collaborate with product teams to develop...
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
3 days ago
Member of Technical Staff - RL Infrastructure
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing AI capable of open-ended learning. We are... ...at scale: distributed rollouts, training orchestration, inference, evals, data pipelines, observability, and reliability. You...
Work at office
Local area
Vmax
San Francisco, CA
5 days ago
Member of Technical Staff
...pointing ours at the frontier of science. Role Overview As a Member of Technical Staff you will shape Conductor's core offerings: AI software... ...Build back‑end services for data collection, labelling, and inference. Integrate with external systems for secure, reliable...
Conductor Quantum
San Francisco, CA
1 day ago
Member of the Technical Staff- LLMs
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation: $170,000 - $220,000 base + 1-3% equity... ...join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the...
Full time
Temporary work
Immediate start
Visa sponsorship
Work visa
Amadeus Search
San Francisco, CA
5 days ago
Member of Technical Staff
What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience. We are looking for an exceptional builder... ...scalability of output Design and build the integration of ML inference, monitoring systems, LLM interactions, application layers,...
Work experience placement
RST Recruitment
San Francisco, CA
3 days ago
Member of Technical Staff - Science, Frontier AI & Robotics (FAR)
...boundaries of what's possible in robotic intelligence. As a Member of Technical Staff, you'll be at the forefront of developing breakthrough... ...end‑to‑end vision‑language‑action models, efficient model inference, video tokenization Design and implement novel deep learning...
Local area
Amazon Science
San Francisco, CA
5 days ago
Member of Technical Staff (Evals & Post-Training Product)
...contributions to developer tools or AI/ML repositories (Desirable) Inference & Hardware Knowledge: Interest in the hardware side of AI—... ...end‑to‑end What the job involves We are seeking a Member of Technical Staff, Evals & Post‑Training Product to help define how...
Fireworks AI
San Francisco, CA
4 days ago
Member of Technical Staff - AI Research
...future of AI. About the role Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As an AI Researcher, you... ...new model architectures and experimenting with novel inference efficiency techniques such as KV caching and FlashAttention...
Gimlet Labs
San Francisco, CA
5 days ago
Member of Technical Staff Backend
$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive Equity Type: Full-Time Visa... ...millions of transactions on AWS, including: - Distributed inference - Caching - Queue orchestration - Self-healing...
Full time
Temporary work
H1b
Work at office
Visa sponsorship
Relocation package
Fuku
San Francisco, CA
3 days ago
Member of Technical Staff - Training Platform
$150k - $300k
...infrastructure that runs the jobs. Core Technical Responsibilities Hosted Training... ...operate Kubernetes-based training and inference orchestration across multi-cluster,... ...in open development and encourage team members to contribute to the broader AI community...
Work at office
Local area
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime Intellect
San Francisco, CA
1 day ago
Member of Technical Staff, AI Platform & Architecture (Infrastructure)
$256k - $276k
...picture and our vision at Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and maintain the... ...infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering...
Work at office
Flexible hours
3 days per week
Postman
San Francisco, CA
2 days ago
Member of Technical Staff, ML Systems
Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration... ...on (not limited to): Building and scaling training and inference infrastructure (potentially for various chips across...
Mirendil
San Francisco, CA
1 day ago
Member of Technical Staff - Post-Training
Member of Technical Staff - Post‑Training Join to apply for the Member of Technical Staff - Post‑Training role at Reflection AI . Our Mission... ...pipelines, reward models, reinforcement learning algorithms, and inference‑time scaling techniques. Collaborate across pre‑training...
Full time
Relocation package
Reflection AI
San Francisco, CA
4 days ago
Member of Technical Staff, Applied AI
The opportunity We are looking for a Member of Technical Staff with deep expertise in generative modelling to work at the interface between our... ...of generative model architectures, training dynamics and inference behaviour. You are a skilful ML developer. You write ML code...
Flexible hours
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
2 days ago
Member of the Technical Staff — AI/ML
...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems... ...: Develop robust AI pipelines from data ingestion through inference, ensuring reliability, scalability, and maintainability....
Full time
Flexible hours
Stuut
San Francisco, CA
9 days ago
Member of Technical Staff (AI Inference Engineer)
...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join...
Perplexity AI
San Francisco, CA
6 days ago
Member of Technical Staff - AI/ML - Autonomous Finance AI Platform
Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations... ...pipelines that support data ingestion, model training, inference, and monitoring while ensuring high availability and...
Full time
Flexible hours
Andiamo
San Francisco, CA
5 days ago
Member of Technical Staff - Agents at Prime Intellect - San Francisco
Member of Technical Staff - Agents at Prime Intellect - San Francisco Building the Future of Open Source + Decentralized AI At Prime Intellect... ...Infrastructure : Understanding how to optimize agent training or inference on GPUs. Advanced AI/ML Knowledge : Familiarity with...
Remote work
Flexible hours
Victrays
San Francisco, CA
1 day ago
Member of Technical Staff - Audio & Voice - Autonomous Finance AI Platform
Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform how businesses operate. A fast-growing... ...latency pipelines that support audio ingestion, streaming inference, orchestration, and monitoring, ensuring consistent performance...
Full time
Flexible hours
Andiamo
San Francisco, CA
3 days ago
Member of Technical Staff - Diffusion Model
...frontier of interactive AI. The Role We’re looking for a Member of Technical Staff — Diffusion Models to help design and train the next generation... ...AI familiarity Interactive generation systems Real-time inference optimization Graphics or game-engine experience...
Moonlake
San Francisco, CA
4 days ago
Member of Technical Staff Diffusion & Multimodal Gen
...Moonlake is hiring a Member of Technical Staff — Diffusion Models to design and train advanced multimodal generative systems. This role focuses on developing diffusion architectures and large-scale training processes to enhance interactive world generation. The ideal candidate...
Moon Lake
San Francisco, CA
3 days ago
Member of Technical Staff
...AI frontier — you won't just observe the cutting edge of AI, your work will define what cutting edge means. We're hiring Members of Technical Staff to design the evaluations that set the standard for how AI is measured, produce analysis that shapes how companies and the...
Artificial Analysis, Inc.
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Inference. Be the first to apply!