Member of Technical Staff - Inference
$150k - $300kPrime Intellect
Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full RL post‑training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end‑to‑end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts. Role Impact This is a hybrid position spanning cloud LLM serving, LLM inference optimization and RL systems. You will be working on advancing our ability to evaluate and serve models trained with our RL Lab at scale. The two key areas are: Building the infrastructure to serve LLMs efficiently at scale. Optimization and integration of inference systems into our RL training stack. Core Technical Responsibilities LLM Serving Multi‑tenant LLM Serving: Build a multi‑tenant LLM serving platform that operates across our cloud GPU fleets. GPU‑Aware Scheduling: Design placement and scheduling algorithms for heterogeneous accelerators. Resilience & Failover: Implement multi‑region/zone failover and traffic shifting for resilience and cost control. Autoscaling & Routing: Build autoscaling, routing, and load balancing to meet throughput/latency SLOs. Model Distribution: Optimize model distribution and cold‑start times across clusters. Inference Optimization & Performance Framework Development: Integrate and contribute to LLM inference frameworks such as vLLM, SGLang, TensorRT‑LLM. Parallelism and Configuration Tuning: Optimize configurations for tensor/pipeline/expert parallelism, prefix caching, memory management and other axes for maximum performance. End‑to‑End Performance: Profile kernels, memory bandwidth and transport; apply techniques such as quantization and speculative decoding. Perf Suites: Develop reproducible performance suites (latency, throughput, context length, batch size, precision). RL Integration: Embed and optimize distributed inference within our RL stack. Platform & Tooling CI/CD: Establish CI/CD with artifact promotion, performance gates, and reproducible builds. Observability: Build metrics, logs, tracing; structured incident response and SLO management. Docs & Collaboration: Document architectures, playbooks, and API contracts; mentor and collaborate cross‑functionally. Technical Requirements Required Experience Building ML Systems at Scale: 3+ years building and running large‑scale ML/LLM services with clear latency/availability SLOs. Inference Backends: Hands‑on with at least one of vLLM, SGLang, TensorRT‑LLM. Distributed Serving Infra: Familiarity with distributed and disaggregated serving infrastructure such as NVIDIA Dynamo. Inference Internals: Deep understanding of prefill vs. decode, KV‑cache behavior, batching, sampling, speculative decoding, parallelism strategies. Full‑Stack Debugging: Comfortable debugging CUDA/NCCL, drivers/kernels, containers, service mesh/networking, and storage, owning incidents end‑to‑end. Infrastructure Skills Python: Systems tooling and backend services. PyTorch: LLM Inference engine development and integration, deployment readiness. Cloud & Automation: AWS/GCP service experience, cloud deployment patterns. Kubernetes: Running infrastructure at scale with containers on Kubernetes. GPU & Networking: Architecture, CUDA runtime, NCCL, InfiniBand; GPU‑aware bin‑packing and scheduling across heterogeneous fleets. Nice to Have Kernel‑Level Optimization: Familiarity with CUDA/Triton kernel development; Nsight Systems/Compute profiling. Systems Performance Languages: Rust, C++. Data & Observability: Kafka/PubSub, Redis, gRPC/Protobuf; Prometheus/Grafana, OpenTelemetry; reliability patterns. Infra & Config Automation: Terraform/Ansible, infrastructure‑as‑code, reproducible environments. Open Source: Contributions to serving, inference, or RL infrastructure projects. What we offer Cash Compensation Range of $150-300k with significant equity incentives. Flexible work arrangement (remote or San Francisco office). Full visa sponsorship and relocation support. Professional development budget. Regular team off‑sites and conference attendance. Opportunity to shape decentralized AI and RL at Prime Intellect. Growth Opportunity You'll join a team of experienced engineers and researchers working on cutting‑edge problems in AI infrastructure. We believe in open development and encourage team members to contribute to the broader AI community through research and open‑source contributions. We value potential over perfection. If you're passionate about democratizing AI development, we want to talk to you. #J-18808-Ljbffr
$180k
...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence...SuggestedTemporary work$200k - $350k
...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today...SuggestedImmediate startFlexible hours- Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation machine learning workloads ($80M series A). As AI systems continue to grow in complexity...Suggested
- ...production workloads built to scale to gigawatt‑class AI datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build inference systems that execute full models end‑to‑end under real...Suggested
- ...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for Member of Technical Staff As a founding member of the engineering... ...ingestion, transformation, training/fine-tuning, and inference? You will also: Find opportunities to go deep into a wide...SuggestedFull timePart timeWork at officeWork from homeFlexible hours2 days per week
$200k
...pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role Evals... ...of many of the company's most important decisions. As a Member of Technical Staff on Evals, you will build both the platform and the evaluations...Visa sponsorshipRelocation package- ...Member Of Technical Staff We're looking for a member of technical staff to build and deploy production-grade AI systems. In this role, you... ...world applications Design scalable pipelines for training, inference, and data processing Improve latency, throughput, cost...
- ...exceptional people to help us get there. The Opportunity Our Edge Inference team compiles Liquid Foundation Models into optimized machine... ...on-device AI possible. You will work directly with the technical lead on problems that require deep understanding of both ML architectures...
- ...great products. Join us on our mission and shape the future! Member of Technical Staff, Search Why this role? We are looking for talented... ...Work closely with the model serving team to ensure that inference is fast and stable. Collaborate with product teams to develop...Full timeWork at officeRemote workFlexible hours
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing AI capable of open-ended learning. We are... ...at scale: distributed rollouts, training orchestration, inference, evals, data pipelines, observability, and reliability. You...Work at officeLocal area- ...pointing ours at the frontier of science. Role Overview As a Member of Technical Staff you will shape Conductor's core offerings: AI software... ...Build back‑end services for data collection, labelling, and inference. Integrate with external systems for secure, reliable...
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation: $170,000 - $220,000 base + 1-3% equity... ...join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the...Full timeTemporary workImmediate startVisa sponsorshipWork visa- What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience. We are looking for an exceptional builder... ...scalability of output Design and build the integration of ML inference, monitoring systems, LLM interactions, application layers,...Work experience placement
- ...boundaries of what's possible in robotic intelligence. As a Member of Technical Staff, you'll be at the forefront of developing breakthrough... ...end‑to‑end vision‑language‑action models, efficient model inference, video tokenization Design and implement novel deep learning...Local area
- ...contributions to developer tools or AI/ML repositories (Desirable) Inference & Hardware Knowledge: Interest in the hardware side of AI—... ...end‑to‑end What the job involves We are seeking a Member of Technical Staff, Evals & Post‑Training Product to help define how...
- ...future of AI. About the role Gimlet Labs is seeking an Member of Technical Staff focused on AI research. As an AI Researcher, you... ...new model architectures and experimenting with novel inference efficiency techniques such as KV caching and FlashAttention...
$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 – $280,000 + Competitive Equity Type: Full-Time Visa... ...millions of transactions on AWS, including: - Distributed inference - Caching - Queue orchestration - Self-healing...Full timeTemporary workH1bWork at officeVisa sponsorshipRelocation package$150k - $300k
...infrastructure that runs the jobs. Core Technical Responsibilities Hosted Training... ...operate Kubernetes-based training and inference orchestration across multi-cluster,... ...in open development and encourage team members to contribute to the broader AI community...Work at officeLocal areaRemote workVisa sponsorshipRelocation packageFlexible hours$256k - $276k
...picture and our vision at Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and maintain the... ...infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering...Work at officeFlexible hours3 days per week- Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration... ...on (not limited to): Building and scaling training and inference infrastructure (potentially for various chips across...
- Member of Technical Staff - Post‑Training Join to apply for the Member of Technical Staff - Post‑Training role at Reflection AI . Our Mission... ...pipelines, reward models, reinforcement learning algorithms, and inference‑time scaling techniques. Collaborate across pre‑training...Full timeRelocation package
- The opportunity We are looking for a Member of Technical Staff with deep expertise in generative modelling to work at the interface between our... ...of generative model architectures, training dynamics and inference behaviour. You are a skilful ML developer. You write ML code...Flexible hours
- ...Activant, 1984 Ventures and Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered systems... ...: Develop robust AI pipelines from data ingestion through inference, ensuring reliability, scalability, and maintainability....Full timeFlexible hours
- ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join...
- Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations... ...pipelines that support data ingestion, model training, inference, and monitoring while ensuring high availability and...Full timeFlexible hours
- Member of Technical Staff - Agents at Prime Intellect - San Francisco Building the Future of Open Source + Decentralized AI At Prime Intellect... ...Infrastructure : Understanding how to optimize agent training or inference on GPUs. Advanced AI/ML Knowledge : Familiarity with...Remote workFlexible hours
- Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform how businesses operate. A fast-growing... ...latency pipelines that support audio ingestion, streaming inference, orchestration, and monitoring, ensuring consistent performance...Full timeFlexible hours
- ...frontier of interactive AI. The Role We’re looking for a Member of Technical Staff — Diffusion Models to help design and train the next generation... ...AI familiarity Interactive generation systems Real-time inference optimization Graphics or game-engine experience...
- ...Moonlake is hiring a Member of Technical Staff — Diffusion Models to design and train advanced multimodal generative systems. This role focuses on developing diffusion architectures and large-scale training processes to enhance interactive world generation. The ideal candidate...
- ...AI frontier — you won't just observe the cutting edge of AI, your work will define what cutting edge means. We're hiring Members of Technical Staff to design the evaluations that set the standard for how AI is measured, produce analysis that shapes how companies and the...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff - Inference. Be the first to apply!
- remote support technician San Francisco, CA
- personal computer support technician San Francisco, CA
- customer support analyst San Francisco, CA
- systems support technician San Francisco, CA
- help desk administrator San Francisco, CA
- decision support analyst San Francisco, CA
- technical support assistant San Francisco, CA
- technical analyst San Francisco, CA
- technical assistant San Francisco, CA
- IT support technician San Francisco, CA

