Member of Technical Staff (AI Inference Engineer)

$220k

Perplexity

We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. What you will work on Examples Of Real Work The Team Does New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow. Rust-native serving runtime. Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic. Performance optimisation. Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving. Reliability and observability. Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents. Who we're looking for Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus. You understand modern LLM architectures and are able to bring them up reliably in a production environment. You've built and operated production distributed systems under real load - ideally performance-critical ones. Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels. You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday. Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you. Good if you touched any of ML compilers and framework internals: PyTorch internals, torch.compile, custom operators. Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism. Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving. Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis. Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads. Qualifications 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems. Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow). Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores). Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation). Compensation Range: $220K - $485K #J-18808-Ljbffr Perplexity

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff (AI Inference Engineer) in San Francisco, CA vacancy

Member of Technical Staff - ML Systems & Inference
Member of Technical Staff - ML Systems & Inference Join a well-funded AI infrastructure startup building the orchestration layer for next-generation AI workloads This... ..., distributed systems, and performance engineering. You'll build production inference systems, optimize...
Suggested
Acceler8 Talent
San Francisco, CA
2 days ago
Member of Technical Staff - Inference
$150k - $300k
...cloud LLM serving, LLM inference optimization and RL systems... ...training stack. Core Technical Responsibilities LLM... ...PyTorch: LLM Inference engine development and integration... ...to shape decentralized AI and RL at Prime... ...development and encourage team members to contribute to the...
Suggested
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime Intellect
San Francisco, CA
3 days ago
Member of Technical Staff, Inference
Member of Technical Staff — ML Systems & Inference Employment Type: Full-time Workplace: On-site About the Company We are... ...layer for the next generation of AI infrastructure. As AI workloads... ...company. As an early member of the engineering team, you will help define the systems...
Suggested
Full time
Acceler8 Talent
San Francisco, CA
15 hours ago
Member of Technical Staff, ML Infrastructure & Inference
Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company building a scalable cloud platform designed for next-generation... .... This opportunity is well suited to engineers who understand how modern models execute at...
Suggested
Acceler8 Talent
San Francisco, CA
3 days ago
Member of Technical Staff - ML Systems & Inference
...first heterogeneous neocloud for AI workloads. As AI systems scale,... ...datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design... .... This role is ideal for engineers who deeply understand how modern...
Suggested
Gimlet Labs
San Francisco, CA
1 day ago
Member of Technical Staff — AI / Software Engineering (US)
...use Cekura to test, monitor, debug, and improve AI agents across voice, chat, SMS, phone, and web.... ...the real world. About the Role We’re hiring a Member of Technical Staff to build the core of Cekura — the simulation engines, evaluation systems, and observability...
Cekura
San Francisco, CA
1 day ago
Member of Technical Staff - Applied AI Engineer
Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied biological intelligence company. We build and deploy software and biological AI systems to safeguard humanity. Applied...
Full time
Work at office
Valthos
San Francisco, CA
1 day ago
Member of Technical Staff, Inference & RL Systems
$225k
...Our approach combines frontier‑scale pre‑training, domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems that...
Relocation
Visa sponsorship
Magic
San Francisco, CA
4 days ago
Member of Technical Staff, Inference
$350k
...Our first goal is to democratize frontier AI R&D across scientific disciplines. We... ...experiments. Our team includes researchers and engineers from Anthropic, Google DeepMind, xAI,... ...We are looking for an engineer to own the inference systems that power our models in production...
Mirendil
San Francisco, CA
3 days ago
Member of Technical Staff - Inference
...parallelism strategies, and help us squeeze every FLOP out of our hardware. What you’ll do Modify and extend state-of-the-art inference engines like vLLM and SGLang. Understand every microsecond of GPU time spent during a forward pass. You'll be able to explain every...
Sail Research
San Francisco, CA
3 days ago
Member of Technical Staff - Inference Research
...whole life of an LLM - train it, deploy it, observe it - and inference is where teams feel the difference every day. We already run elastic... ...Work directly with customers alongside our Forward Deployed Engineers to deploy and tune models, and bring what you learn back into...
Work at office
Mixpeek
San Francisco, CA
4 days ago
Member of Technical Staff, Inference
$200k - $400k
About The Role We're looking for an inference runtime engineer to push the boundaries of what's possible... ...directly impact how the world runs AI inference. Skills And Qualifications... ...LlamaFactory, etc). Written widely-shared technical blogs or side projects on vLLM or LLM...
Remote work
Visa sponsorship
Shift work
Inferact
San Francisco, CA
1 day ago
Member of Technical Staff - Edge Inference Engineer
About Liquid AI Spun out of MIT CSAIL, we build general-purpose... .... The Opportunity Our Edge Inference team compiles Liquid... ...will work directly with the technical lead on problems that require... ...proficiency Embedded software engineering experience or work on resource...
Bold Capital Partners
San Francisco, CA
3 days ago
Member of Technical Staff, AI Reliability & Monitoring Engineering Lead
$256k - $276k
Overview Member of Technical Staff, AI Reliability & Monitoring Engineering Lead — Postman Join to apply for the Member of Technical Staff, AI Reliability & Monitoring Engineering Lead role at Postman. What You’ll Do Develop and manage reliability metrics (SLOs) for AI...
Full time
Work at office
Flexible hours
3 days per week
Postman
San Francisco, CA
4 days ago
Applied AI Inference - Forward Deployed Engineer
$165k - $330k
...Baseten powers mission-critical inference for dynamic AI companies, enabling models... ...Role As a Forward Deployed Engineer at Baseten, you will... ...aspects of product management, technical customer success, and pre-... ...these blog posts written by members of our Forward Deployed...
Work experience placement
Flexible hours
Baseten
San Francisco, CA
1 day ago
Member of Technical Staff (AI Software Engineer, Multimodal)
...is hiring builders to join our Multimodal AI group, an industry-leading team defining... ...modalities we have yet to invent. As an engineer on the Multimodal AI team, you will work... ...products end‑to‑end, from problem definition to technical design, implementation, and launch. Hill...
Perplexity AI Inc.
San Francisco, CA
2 days ago
Member of Technical Staff (AI Engineering)
$150k - $250k
...servicing with the industry’s most advanced AI credit-servicing agents. We are backed... ...Product Hunt), Charlie Songhurst (Board Member, Meta), and Michael Jones (Former Chair,... ...the United Nations, UChicago, and Oxford engineers and researchers. Our omnichannel...
Full time
Internship
Worldwide
Krew
San Francisco, CA
4 days ago
Member of Technical Staff
...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for... ...As a founding member of the engineering team, you will impact the design... ...is revolutionizing the AI development landscape with... ...training/fine-tuning, and inference? You will also: Find opportunities...
Full time
Part time
Work at office
Work from home
Flexible hours
2 days per week
Pixeltable, Inc.
San Francisco, CA
3 days ago
Member of Technical Staff
$240k - $280k
...0/yr Direct message the job poster from Cabana Senior Engineer / Member of Technical Staff @ AI Healthcare Startup $240,000 - $280,000 You know how breakthrough... ...increase your chances of interviewing at Cabana by 2x Inferred from the description for this job 401(k) Get notified...
Full time
Remote work
Worldwide
Relocation
Cabana
San Francisco, CA
2 days ago
Member of Technical Staff (AI Software Engineer, Agents)
$200k - $300k
Location San Francisco Employment Type Full time Department AI Compensation $200K - $300K • Offers Equity U.S. Benefits Full‑time... ...the amounts listed above. Perplexity is seeking an energetic engineer to join our highly driven Comet Agents engineering team. The...
Full time
Flexible hours
B Capital
San Francisco, CA
4 days ago
Member of Technical Staff, Evals
$200k
...RL, ultra-long context, and inference-time compute to achieve... ...important decisions. As a Member of Technical Staff on Evals, you will build both... ...about helping researchers and engineers make better decisions... ...collaborative team working on frontier AI systems Magic strives to be...
Visa sponsorship
Relocation package
Magic
San Francisco, CA
4 days ago
Member of Technical Staff, Kernels
Member of Technical Staff — Kernels & GPU Performance Employment Type: Full-time... ...layer for the next era of AI infrastructure. As AI workloads... ...As an early member of the engineering team, you will help define... ...characteristics across the inference stack Partner with compiler...
Full time
Acceler8 Talent
San Francisco, CA
1 day ago
Evaluations - Member of Technical Staff
$200k - $400k
...that. We have built the first AI simulation of society,... ...Rauch. About the Role As a Member of Technical Staff, Model Evaluations at Simile... ...model training, and research engineering. Others may bring... ...Bayesian modeling, causal inference, psychometrics, polling, or...
Flexible hours
Simile
San Francisco, CA
15 hours ago
Member of Technical Staff
$250k
...an enterprise-grade AI platform that lets companies... ...The team is small, technical, and moving fast,... ...AI Tools. The Role Member of Technical Staff who can handle... ...pipelines for training, inference, and data processing... ...: Python; modern engineering / ML frameworks; AWS...
Full time
David Joseph & Company
San Francisco, CA
4 days ago
Member of Technical Staff, FAR (Frontier AI & Robotics)
$150k
...robotics at Amazon's Frontier AI & Robotics team, where you'... ...robotic intelligence. As a Member of Technical Staff, you’ll spearhead the... ...action models, efficient model inference, and video tokenization.... ...contributions. Collaborate with engineering teams to optimize and scale...
Local area
Amazon Science
San Francisco, CA
2 days ago
Member of Technical Staff
...Quantum superintelligence is an AI that uses quantum computers... ...of the world's software engineers. AI is already generating... ...science. Role Overview As a Member of Technical Staff you will shape Conductor's... ...collection, labelling, and inference. Integrate with external...
Conductor Quantum
San Francisco, CA
15 hours ago
Member of Technical Staff, Frontier AI & Robotics (FAR)
Amazon's Frontier AI & Robotics (FAR) team is seeking a Member of Technical Staff to drive foundational research and build intelligent robotic systems from the ground... ...vision‑language‑action models, efficient model inference, video tokenization. Design and implement novel...
PVH (Tommy Hilfiger/Calvin Klein)
San Francisco, CA
3 days ago
Sieve — Member of Technical Staff, Applied Research
$150k - $350k
Sieve — Member of Technical Staff, Applied Research Type: Full-time | On-site |... ...About Sieve Sieve is the only AI research lab exclusively... ...Role As an applied research engineer, you'll build high-performance... ..., parallelism, pipelining, inference optimization, and...
Full time
Work experience placement
H1b
Work at office
Visa sponsorship
davidjoseph-co
San Francisco, CA
4 days ago
Member of Technical Staff
...building the best way to talk to AI and humans together — where AI... ...day, and everyone talks to users. Member of Technical Staff is the title we use for engineers who own hard problems end to end... ..., fine-tuning, evaluation, inference, or RAG at scale High-performance...
Shapes, Inc
San Francisco, CA
1 day ago
Founding Member of Technical Staff, AI Infrastructure
# Founding Member of Technical Staff, AI Infrastructure**Location:** San Francisco / Bay Area preferred... ...and easier to own by turning inference behavior, traces, workload replay, GPU... ...About the RoleThis is a broad founding engineering role for a senior builder who can move...
Full time
Remote work
Touchdown Labs, Inc.
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff (AI Inference Engineer). Be the first to apply!