Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff (AI Inference Engineer)

Perplexity AI

Inference Engine Engineer

We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us.

What You Will Work On
  • New Models Support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway.

  • GPU Kernels Migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow.

  • Rust-native Serving Runtime. Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic.

  • Performance Optimization. Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving.

  • Reliability and Observability. Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents.

Who We're Looking For
  • Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus.

  • You understand modern LLM architectures and are able to bring them up reliably in a production environment.

  • You've built and operated production distributed systems under real load - ideally performance-critical ones.

  • Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels.

  • You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday.

  • Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you.

Good If You Touched Any Of
  • ML Compilers and Framework Internals: PyTorch internals, torch.compile, custom operators.

  • Distributed GPU Communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism.

  • Low-precision Inference: INT8/FP8/FP4 quantization, mixed-precision serving.

  • Profiling and Debugging Tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis.

  • Container Orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads.

Qualifications
  • 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems.

  • Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow).

  • Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores).

  • Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation).

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff (AI Inference Engineer) in San Francisco, CA vacancy
  •  ...first heterogeneous neocloud for AI workloads. As AI systems...  ...Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design...  ...scalable. This role is ideal for engineers who deeply understand how modern... 
    Suggested

    Gimlet Labs

    San Francisco, CA
    3 days ago
  • $150k - $300k

     ...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri...  ...cloud LLM serving, LLM inference optimization and RL systems...  ...training stack. Core Technical Responsibilities LLM Serving...  ...PyTorch: LLM Inference engine development and...  ...development and encourage team members to contribute to the... 
    Suggested
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime-Intellect

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation...  .... This opportunity is well suited to engineers who understand how modern models execute... 
    Suggested

    Acceler8 Talent

    San Francisco, CA
    4 days ago
  •  ...servicing with the industry's most advanced AI credit-servicing agents. We are backed by...  ...Product Hunt), Charlie Songhurst (Board Member, Meta), and Michael Jones (Former Chair,...  ...the United Nations, UChicago, and Oxford engineers and researchers. Our omnichannel agents... 
    Suggested
    Full time
    Internship
    Worldwide

    Krew Research

    San Francisco, CA
    3 days ago
  • $100k - $300k

     ...Cogent Security Cogent is an Applied AI Lab building the next generation of AI...  ...are looking for talented, ambitious AI/ML Engineers who are excited to build in the Applied AI...  ...Onboard, support and uplevel future team members Mentor and grow future junior team members... 
    Suggested

    Cogent Security

    San Francisco, CA
    9 hours ago
  •  ...AI Infra Engineer We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch...  ...Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-... 

    Perplexity AI

    San Francisco, CA
    1 day ago
  • Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied biological intelligence company. We build and deploy software and biological AI systems to safeguard humanity. Applied... 
    Full time
    Work at office

    Valthos

    San Francisco, CA
    4 days ago
  • Member of Technical Staff: AI Research & Engineering in Media Integrity About Synhawk Synhawk builds omnimodal foundation models for communication integrity, aimed at infrastructure-side deployment in telco and banking sectors. Our platform analyzes the integrity of audio... 
    Immediate start
    Shift work

    Synhawk

    San Francisco, CA
    5 days ago
  • Overview Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops...  ...history of tuning hardware-software interactions for maximum efficiency, throughput, and responsiveness #J-18808-Ljbffr Genesis AI
    Remote job

    Genesis AI

    San Francisco, CA
    5 days ago
  • Overview About Liquid AI Spun out of MIT CSAIL, we build general...  .... The Opportunity Our Edge Inference team compiles Liquid...  ...will work directly with the technical lead on problems that require...  ...Experience Embedded software engineering experience or work on resource... 

    Liquid AI

    San Francisco, CA
    2 days ago
  • $225k

     ...approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About the role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems... 
    Full time
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    1 hour ago
  •  ...powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion...  ...help build the platform engineers turn to to ship AI...  ...aspects of product management, technical customer success, and pre...  ...blog posts written by members of our Forward Deployed Engineering... 
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    5 days ago
  •  ...Perplexity is seeking an energetic engineer to join our highly driven Agents engineering team. The Agents team consists of AI/ML, backend, and full-stack engineers who collaborate to build delightful agentic experiences within our Comet ecosystem and Perplexity Computer... 
    Flexible hours

    Perplexity

    San Francisco, CA
    9 hours ago
  • $200k - $350k

     ...Edison Scientific builds and commercializes AI agents for science. Scientific discovery...  ...assembling a team of top researchers and engineers across AI and biology to build an AI...  ...reliability and adoption, and be the go-to technical contact for AI within the client organization... 
    Work at office
    Remote work

    Edison Scientific Inc.

    San Francisco, CA
    2 days ago
  • $200k

     ...RL, ultra-long context, and inference-time compute to achieve...  ...important decisions. As a Member of Technical Staff on Evals, you will build both...  ...For Strong software engineering fundamentals Experience...  ...team working on frontier AI systems Magic strives... 
    Visa sponsorship
    Relocation package

    Magic Inc

    San Francisco, CA
    1 day ago
  • $150k - $280k

     ...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,...  ...for banks and fintechs using AI agents that function like...  ...growth and is expanding its engineering team to accelerate development...  ...: - Distributed inference - Caching - Queue orchestration... 
    Full time
    Temporary work
    H1b
    Work at office
    Visa sponsorship
    Relocation package

    Fuku

    San Francisco, CA
    4 days ago
  • $150k

     ...Amazon's Frontier AI & Robotics (FAR) team is seeking a Member of Technical Staff to drive foundational research and build...  ...action models, efficient model inference, video tokenization - Design...  ...bridging research with practical engineering implementation in real... 
    Local area

    Amazon

    San Francisco, CA
    4 days ago
  •  ...heterogeneous neocloud for AI workloads. As AI systems...  ...Gimlet Labs is seeking an Member of Staff focused on AI Research (Intern...  ...and experimenting with novel inference efficiency techniques such as...  ...degree in computer science, engineering, or comparable area of study... 
    Internship

    Gimlet Labs

    San Francisco, CA
    5 days ago
  • $150k - $300k

     ...Chief Scientist, Together AI), Dylan Patel (...  ...the jobs. Core Technical Responsibilities Hosted...  ...-based training and inference orchestration across...  ...We're looking for engineers who are fluent across...  ...development and encourage team members to contribute to the... 
    Work at office
    Local area
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    1 day ago
  • $256k - $276k

     ...at Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and maintain...  ...infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering and research teams to ensure performance,... 
    Work at office
    Flexible hours
    3 days per week

    Postman

    San Francisco, CA
    3 days ago
  • Job Description We’re looking for a Member of Technical Staff to build and deploy production-grade AI systems. In this role, you’ll work...  ...in real-world applications Systems Engineering: Design scalable pipelines for training, inference, and data processing Performance... 

    ERAGON

    San Francisco, CA
    4 days ago
  • $200k

    Member of Technical Staff, RL Research & Environments Posted Feb 28, 2026 | Full-time | Advanced (5-10 yrs...  ...domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About The Role As a Software Engineer on the RL Research & Environments team,... 
    Full time
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    2 days ago
  •  ...Quantum superintelligence is an AI that uses quantum computers...  ...of the world's software engineers. AI is already generating...  ...science. Role Overview As a Member of Technical Staff you will shape Conductor's...  ...collection, labelling, and inference. Integrate with external... 

    Conductor Quantum, Inc.

    San Francisco, CA
    2 days ago
  •  ...Member of Technical Staff, Model Efficiency Who are we? Our mission is to scale...  ...enterprises who are building AI systems to power magical...  ...is a team of researchers, engineers, designers, and more, who are...  ...pushing the boundaries of LLM inference efficiency. We develop... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    5 days ago
  • $170k - $220k

    Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid...  ...and technically strong engineer to join a lean, high-...  ...team building next-generation inference infrastructure for LLMs. This...  ...systems, GPU orchestration, or AI infra Strong technical... 
    Full time
    Temporary work
    Immediate start
    Visa sponsorship
    Work visa

    Amadeus Search

    San Francisco, CA
    1 day ago
  • Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for...  ...As a founding member of the engineering team, you will impact the design...  ...is revolutionizing the AI development landscape with...  ...training/fine-tuning, and inference? You will also: Find opportunities... 
    Full time
    Part time
    Work at office
    Work from home
    Flexible hours
    2 days per week

    Pixeltable, Inc.

    San Francisco, CA
    3 days ago
  • $150k - $350k

    Member of Technical Staff, Applied Research — Sieve Location: San Francisco, CA...  ...About Sieve Sieve is the only AI research company...  ...technical applied research engineering role sitting between research...  ...optimization Parallelized inference systems and pipeline orchestration... 
    Full time
    H1b
    Visa sponsorship

    David Joseph & Company

    San Francisco, CA
    1 day ago
  • Member of Technical Staff, Statistical Genetics Location: SF Bay Area Type: Full...  ...Numerics Radical Numerics is an AI research lab building...  ...lower the barrier to creating engineered threats and AI‑generated...  ...randomization, TWAS, causal inference, cross‑ancestry genetics, admixed... 
    Full time

    Radical Numerics

    San Francisco, CA
    3 days ago
  •  ...Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered...  .... This is a hands‑on role for an engineer who thrives at the intersection of...  ...from data ingestion through inference, ensuring reliability, scalability,... 
    Full time
    Flexible hours

    Stuut

    San Francisco, CA
    5 days ago
  • Member of Technical Staff, Applied AI The opportunity We are looking for a Member of Technical Staff with deep...  ...team of machine learners, protein engineers and biologists, jointly working to...  ..., training dynamics and inference behaviour. You are a skilful ML developer... 
    Flexible hours

    Latent Labs

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff (AI Inference Engineer). Be the first to apply!