Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff (AI Inference Engineer)

$220k

Perplexity

We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. What you will work on Examples Of Real Work The Team Does New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow. Rust-native serving runtime. Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic. Performance optimisation. Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving. Reliability and observability. Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents. Who we're looking for Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus. You understand modern LLM architectures and are able to bring them up reliably in a production environment. You've built and operated production distributed systems under real load - ideally performance-critical ones. Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels. You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday. Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you. Good if you touched any of ML compilers and framework internals: PyTorch internals, torch.compile, custom operators. Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism. Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving. Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis. Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads. Qualifications 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems. Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow). Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores). Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation). Compensation Range: $220K - $485K #J-18808-Ljbffr Perplexity

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff (AI Inference Engineer) in San Francisco, CA vacancy
  • $150k - $300k

     ...cloud LLM serving, LLM inference optimization and RL systems...  ...training stack. Core Technical Responsibilities LLM...  ...PyTorch: LLM Inference engine development and integration...  ...to shape decentralized AI and RL at Prime...  ...development and encourage team members to contribute to the... 
    Suggested
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime Intellect

    San Francisco, CA
    5 days ago
  • $180k

     ...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit...  ..., highly motivated, and focused on engineering excellence. This organization is for... 
    Suggested
    Temporary work

    Xai

    San Francisco, CA
    3 days ago
  •  ...first heterogeneous neocloud for AI workloads. As AI systems scale,...  ...Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design...  ...scalable. This role is ideal for engineers who deeply understand how modern... 
    Suggested

    Gimlet Labs, Inc.

    San Francisco, CA
    1 day ago
  • About the Role As a Member of Technical Staff, Inference at Radical Numerics, you will build and optimize the...  ...systems that bring frontier biological AI models into production. Your work...  ...optimization, GPU systems, and performance engineering. You should be excited by questions... 
    Suggested
    Local area

    Radical Numerics Inc.

    San Francisco, CA
    2 days ago
  • $100k - $300k

    About Cogent Security Cogent is an Applied AI Lab building the next generation of AI...  ...are looking for talented, ambitious AI/ML Engineers who are excited to build in the Applied AI...  ...Onboard, support and uplevel future team members Mentor and grow future junior team... 
    Suggested

    Cogent Security

    San Francisco, CA
    1 day ago
  • Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied biological intelligence company. We build and deploy software and biological AI systems to safeguard humanity. Applied... 
    Full time
    Work at office

    Valthos

    San Francisco, CA
    4 days ago
  •  ...balancing, and parallelism , Worked on low-level optimizations for inference, such as GPU kernels and code generation , Worked on...  ..., and low-precision numerics , Worked on large-scale inference engines or reinforcement learning frameworks , Worked on large-scale, high... 

    xAI

    San Francisco, CA
    4 days ago
  • Overview About Liquid AI Spun out of MIT CSAIL, we build general...  .... The Opportunity Our Edge Inference team compiles Liquid...  ...will work directly with the technical lead on problems that require...  ...Experience Embedded software engineering experience or work on resource... 

    Liquid AI

    San Francisco, CA
    3 days ago
  •  ...powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion...  ...help build the platform engineers turn to to ship AI...  ...aspects of product management, technical customer success, and pre...  ...blog posts written by members of our Forward Deployed Engineering... 
    Work experience placement
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  •  ...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for...  ...As a founding member of the engineering team, you will impact the design...  ...is revolutionizing the AI development landscape with...  ...training/fine-tuning, and inference? You will also: Find opportunities... 
    Full time
    Part time
    Work at office
    Work from home
    Flexible hours
    2 days per week

    Pixeltable, Inc.

    San Francisco, CA
    1 day ago
  •  ...What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience....  ...Design and build the integration of ML inference, monitoring systems, LLM interactions...  ...of experience in backend software engineering, with a focus on Python in well‑established... 
    Work experience placement

    RST Recruitment

    San Francisco, CA
    1 day ago
  •  ...Quantum superintelligence is an AI that uses quantum computers...  ...of the world's software engineers. AI is already generating...  ...science. Role Overview As a Member of Technical Staff you will shape Conductor's...  ...collection, labelling, and inference. Integrate with external systems... 

    Conductor Quantum

    San Francisco, CA
    1 day ago
  •  ...building the best way to talk to AI and humans together — where AI...  ...day, and everyone talks to users. Member of Technical Staff is the title we use for engineers who own hard problems end to end...  ..., fine-tuning, evaluation, inference, or RAG at scale High-performance... 

    Shapes

    San Francisco, CA
    1 day ago
  •  ...is hiring builders to join our Multimodal AI group, an industry-leading team defining...  ...modalities we have yet to invent. As an engineer on the Multimodal AI team, you will work...  ...products end‑to‑end, from problem definition to technical design, implementation, and launch. Hill... 

    Perplexity AI Inc.

    San Francisco, CA
    15 hours ago
  • Krew is on a mission to transform credit-servicing with the industry’s most advanced AI credit-servicing agents. Job Responsibilities: Engineer text and email agents using our proprietary email agent framework Maintain and develop our email agent framework Develop custom... 
    Full time
    Internship

    Krew Inc.

    San Francisco, CA
    5 hours ago
  • $150k - $280k

     ...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,000 –...  ...for banks and fintechs using AI agents that function like human...  ...and is expanding its engineering team to accelerate development...  ...pipelines, distributed inference, and automation frameworks.... 
    Full time
    Temporary work
    H1b
    Work at office
    Visa sponsorship
    Relocation package

    Fuku

    San Francisco, CA
    1 day ago
  •  ...Member Of Technical Staff We're looking for a member of technical staff to build...  ...deploy production-grade AI systems. In this role, you'...  ...scalable pipelines for training, inference, and data processing...  ...Master's in computer science, engineering, or related field Strong... 

    ERAGON

    San Francisco, CA
    5 days ago
  • $200k - $300k

    Location San Francisco Employment Type Full time Department AI Compensation $200K - $300K • Offers Equity U.S. Benefits Full‑time...  ...the amounts listed above. Perplexity is seeking an energetic engineer to join our highly driven Comet Agents engineering team. The... 
    Full time
    Flexible hours

    B Capital

    San Francisco, CA
    2 days ago
  •  ...Us Radical Numerics is an AI research lab building general...  ...the barrier to creating engineered threats and AI-generated bioweapons...  .... About the Role As a Member of Technical Staff focused on statistical...  ...randomization, TWAS, causal inference, cross-ancestry genetics, admixed... 
    Local area

    Radical Numerics Inc.

    San Francisco, CA
    3 days ago
  • Member of Technical Staff, Applied Research About Us At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest...  ...and most scalable inference in the industry. We’ve...  ...of ML research, systems engineering, and customer‑facing problem... 

    SupportFinity™

    San Francisco, CA
    1 day ago
  • Requirements 1 - 7 years of software engineering experience (We are hiring at...  ...to developer tools or AI/ML repositories (Desirable) Inference & Hardware Knowledge: Interest...  ...job involves We are seeking a Member of Technical Staff, Evals & Post‑Training Product... 

    Fireworks AI

    San Francisco, CA
    1 day ago
  • About the Role As a Member of Technical Staff, AI Supercomputing at Radical Numerics, you will design,...  ...powers our large-scale training and inference. You will deliver high-performance,...  ...experimentation. Collaborate across research and engineering. Partner closely with model... 
    Local area

    Radical Numerics Inc.

    San Francisco, CA
    2 days ago
  • $200k

     ...RL, ultra-long context, and inference-time compute to achieve...  ...important decisions. As a Member of Technical Staff on Evals, you will build both...  ...about helping researchers and engineers make better decisions...  ...collaborative team working on frontier AI systems Magic strives to be... 
    Visa sponsorship
    Relocation package

    Magic

    San Francisco, CA
    2 days ago
  • $150k - $300k

     ...Chief Scientist, Together AI), Dylan Patel (...  ...runs the jobs. Core Technical Responsibilities Hosted...  ...Kubernetes-based training and inference orchestration across...  ...We're looking for engineers who are fluent across...  ...development and encourage team members to contribute to the... 
    Work at office
    Local area
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Kubelt

    San Francisco, CA
    2 days ago
  •  ...Us Radical Numerics is an AI research lab building general...  ...the barrier to creating engineered threats and AI-generated bioweapons...  .... About the Role As a Member of Technical Staff, Infrastructure & Training...  ...exceptional training and inference systems: infrastructure that... 
    Local area

    Radical Numerics Inc.

    San Francisco, CA
    3 days ago
  • The opportunity We are looking for a Member of Technical Staff with deep expertise in generative modelling...  ...team of machine learners, protein engineers and biologists, jointly working to...  ...architectures, training dynamics and inference behaviour. You are a skilful ML developer... 
    Flexible hours

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company...  ...is to democratize frontier AI R&D across scientific disciplines...  ...includes researchers and engineers from Anthropic, Google...  ...Building and scaling training and inference infrastructure (potentially... 

    Mirendil

    San Francisco, CA
    3 days ago
  •  ...heterogeneous neocloud for AI workloads. As AI systems scale...  ...Gimlet Labs is seeking an Member of Staff focused on AI Research (Intern...  ...experimenting with novel inference efficiency techniques such as...  ...degree in computer science, engineering, or comparable area of study... 
    Internship

    Gimlet Labs

    San Francisco, CA
    15 hours ago
  • $150k

     ...robotics at Amazon's Frontier AI & Robotics team, where you'...  ...robotic intelligence. As a Member of Technical Staff, you'll spearhead the...  ...action models, efficient model inference, and video tokenization...  ...contributions Collaborate with engineering teams to optimize and scale... 
    Local area

    Amazon Science

    San Francisco, CA
    15 hours ago
  •  ...recognize parts of inputs that are unimportant, reducing inference costs for scale-ups and enterprises that integrate LLMs into...  ...team is 5 people with a research and product focus. As a Member of Technical Staff on our infrastructure team, you'll own the cloud systems... 
    Visa sponsorship

    The Token Company

    San Francisco, CA
    15 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff (AI Inference Engineer). Be the first to apply!