Member of Technical Staff - Inference
Prime Intellect
Building Open Superintelligence Infrastructure Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. We recently raised $15mm in funding (total of $20mm raised) led by Founders Fund, with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI) and many others. Role Impact This is a hybrid position spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on advancing our ability to evaluate and serve models trained with our RL Lab at scale. The two key areas are:
- Building the infrastructure to serve LLMs efficiently at scale.
- Optimization and integration of inference systems into our RL training stack.
- Multitenant LLM Serving: Build a multi-tenant LLM serving platform that operates across our cloud GPU fleets.
- GPUAware Scheduling: Design placement and scheduling algorithms for heterogeneous accelerators.
- Resilience & Failover: Implement multiregion/zone failover and traffic shifting for resilience and cost control.
- Autoscaling & Routing: Build autoscaling, routing, and load balancing to meet throughput/latency SLOs.
- Model Distribution: Optimize model distribution and cold-start times across clusters.
- Framework Development: Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRTLLM.
- Parallelism and Configuration Tuning: Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance.
- EndtoEnd Performance: Profile kernels, memory bandwidth and transport; apply techniques such as quantization and speculative decoding.
- Perf Suites: Develop reproducible performance suites (latency, throughput, context length, batch size, precision).
- RL Integration: Embed and optimize distributed inference within our RL stack.
- CI/CD: Establish CI/CD with artifact promotion, performance gates, and reproducible builds.
- Observability: Build metrics, logs, tracing; structured incident response and SLO management.
- Docs & Collaboration: Document architectures, playbooks, and API contracts; mentor and collaborate crossfunctionally.
- Building ML Systems at Scale: 3+ years building and running largescale ML/LLM services with clear latency/availability SLOs.
- Inference Backends: Handson with at least one of vLLM, SGLang, TensorRTLLM.
- Distributed Serving Infra: Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo.
- Inference Internals: Deep understanding of prefill vs. decode, KVcache behavior, batching, sampling, speculative decoding, parallelism strategies.
- FullStack Debugging: Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents endtoend.
- Python: Systems tooling and backend services.
- PyTorch: LLM Inference engine development and integration, deployment readiness.
- Cloud & Automation: AWS/GCP service experience, cloud deployment patterns.
- Kubernetes: Running infrastructure at scale with containers on Kubernetes.
- GPU & Networking: Architecture, CUDA runtime, NCCL, InfiniBand; GPUaware binpacking and scheduling across heterogeneous fleets.
- KernelLevel Optimization: Familiarity with CUDA/Triton kernel development; Nsight Systems/Compute profiling.
- Systems Performance Languages: Rust, C++ .
- Data & Observability: Kafka/PubSub, Redis, gRPC/Protobuf; Prometheus/Grafana, OpenTelemetry; reliability patterns.
- Infra & Config Automation : Terraform/Ansible, infrastructure-as-code, reproducible environments
- Open Source: Contributions to serving, inference, or RL infrastructure projects.
- Cash Compensation Range of $150-300kwith significant equity incentives
- Flexible work arrangement (remote or San Francisco office)
- Full visa sponsorship and relocation support
- Professional development budget
- Regular team off-sites and conference attendance
- Opportunity to shape decentralized AI and RL at Prime Intellect
Vacancy posted 19 hours ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - Inference in United States vacancy
$180k
...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence...SuggestedTemporary work$180k
...a result, all engineers and researchers share the title "Member of Technical Staff." We operate with a flat organizational structure. All employees... .... Focus Optimizing the latency and throughput of model inference. Building reliable production serving systems to serve...SuggestedRelocation- RadixArk is seeking a Member of Technical Staff — Inference to push the limits of large-scale AI inference. You will work on the core systems that serve frontier models at scale, optimizing performance, latency, throughput, and cost across thousands of GPUs. This role...SuggestedWorldwideFlexible hours
- ...production workloads built to scale to gigawatt-class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under...Suggested
- Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation machine learning workloads ($80M series A). As AI systems continue to grow in complexity...Suggested
- ...Job Title What You'll Do Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops in robotics Design and optimize distributed inference systems on GPU clusters, pushing throughput with large...Remote work
$139.9k - $274.8k
...Overview Our Inference team is responsible for building and maintaining the tools and systems that enable Microsoft AI researchers... ...Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding...Ongoing contractWork experience placementWork at officeLocal area- ...engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in production. Your work will make inference faster, more cost-effective, and more reliable. Key Responsibilities Build and optimize high-performance model serving...Immediate startFlexible hours
- ...Lead Member of Technical Staff Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG,...Full timeWork at officeRemote workFlexible hours
- ...people to help us get there. The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine... ...efficient on-device AI possible. You will work directly with the technical lead on problems that require deep understanding of both ML...Remote work
- ...Technical Staff Position Adaptive ML is a frontier AI startup building a Reinforcement Learning... ...like a fit, please apply! As a Member of Technical Staff, you will contribute... ...hundreds of GPUs; Profile and iterate GPU inference kernels in Triton or CUDA, identifying...Work at officeRemote workRelocationVisa sponsorship
- ...a typical "Applied Scientist" or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will: # Work directly with enterprise... ...learn) large-scale datasets and distributed training or inference pipelines. Understanding of LLM architectures, tuning techniques...Full timeWork at officeRemote workFlexible hours
- ...About the Role RadixArk is hiring a Member of Technical Staff - Supercomputing to help build, deploy, and operate production-grade AI infrastructure for frontier-scale inference and training workloads. This role sits at the intersection of engineering, deployment...Flexible hours
$180k
...Member of Technical Staff - Multimodal Understanding Palo Alto, CA About xAI xAI's mission is to create AI systems that can accurately... ...systems for multimodal pre-training, post-training, inference, data processing, and tokenization at web/petabyte scale....Temporary work- ...Member Of Technical Staff, Search We are looking for talented individuals to help us develop state-of-the-art models for information retrieval... ...Work closely with the model serving team to ensure that inference is fast and stable. Collaborate with product teams to...Full timeWork at officeRemote workFlexible hours
$200k
...pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the Role Evals... ...many of the company's most important decisions. As a Member of Technical Staff on Evals, you will build both the platform and the...Visa sponsorshipRelocation package$119.8k - $234.7k
...Overview Microsoft AI is looking for a Member of Technical Staff - Capacity & Efficiency Infrastructure , to help us improve manage, and... ...will span model architecture, data curation, training and inference infrastructure, evaluation protocols, alignment and reinforcement...Ongoing contractWork at officeLocal area$180k - $250k
Member of Technical Staff -- TPU Systems (JAX / XLA / PALLAS) About the Role RadixArk is looking for a TPU Systems Engineer to build high-performance inference and training systems using JAX, XLA, and Pallas. You'll push large-model workloads to their limits on TPU hardware...Full timeFlexible hours$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation: $170,000 - $220,000 base + 1-3% equity... ...join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the...Full timeTemporary workImmediate startVisa sponsorshipWork visa- Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical Staff As a founding member of the engineering... ...ingestion, transformation, training/fine-tuning, and inference? You will also: Find opportunities to go deep into a wide...Full timePart timeWork at officeWork from homeFlexible hours2 days per week
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing AI capable of open-ended learning. We are... ...at scale: distributed rollouts, training orchestration, inference, evals, data pipelines, observability, and reliability. You...Work at officeLocal area- What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience. We are looking for an exceptional builder... ...scalability of output Design and build the integration of ML inference, monitoring systems, LLM interactions, application layers,...Work experience placement
- Job Description We’re looking for a Member of Technical Staff to build and deploy production-grade AI systems. In this role, you’ll work across... ...Engineering: Design scalable pipelines for training, inference, and data processing Performance Optimization: Improve latency...
- ...pointing ours at the frontier of science. Role Overview As a Member of Technical Staff you will shape Conductor's core offerings: AI software... ...Build back‑end services for data collection, labelling, and inference. Integrate with external systems for secure, reliable...
$175k - $220k
...Member of Technical Staff, Performance Optimization San Mateo, CA About Us At Fireworks, we're building the future of generative AI... ...highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the...$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive Equity Type: Full-Time Visa... ...millions of transactions on AWS, including: - Distributed inference - Caching - Queue orchestration - Self-healing...Full timeTemporary workH1bWork at officeVisa sponsorshipRelocation package$230k
...Sr. Member of Technical Staff Sunnyvale, CA Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer... ...allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly...Remote work$175k - $240k
...Member of Technical Staff, Research San Mateo, CA About Us At Fireworks, we're building the future of generative AI infrastructure.... ...highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the...Work experience placementInternship- ...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems... ...the needle Build robust AI pipelines from ingestion to inference — reliable, maintainable, and cost-efficient through smart...Full timeFlexible hours
- ...AI datacenters. Mission Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As an AI Researcher, you will be evaluating... ...new model architectures and experimenting with novel inference efficiency techniques such as KV caching and...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff - Inference. Be the first to apply!
Related searches
- technical support assistant United States
- technical analyst United States
- technical operations specialist United States
- end user support technician United States
- IT assistant United States
- oracle technical analyst United States
- help desk assistant United States
- IT support technician United States
- operations support technician United States
- desktop support analyst United States

