Member of Technical Staff (AI Inference Engineer)

$220k

Perplexity

We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. What you will work on Examples Of Real Work The Team Does New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow. Rust-native serving runtime. Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic. Performance optimisation. Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving. Reliability and observability. Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents. Who we're looking for Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus. You understand modern LLM architectures and are able to bring them up reliably in a production environment. You've built and operated production distributed systems under real load - ideally performance-critical ones. Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels. You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday. Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you. Good if you touched any of ML compilers and framework internals: PyTorch internals, torch.compile, custom operators. Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism. Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving. Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis. Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads. Qualifications 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems. Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow). Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores). Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation). Compensation Range: $220K - $485K #J-18808-Ljbffr

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff (AI Inference Engineer) in San Francisco, CA vacancy

Member of Technical Staff - Inference
$150k - $300k
...cloud LLM serving, LLM inference optimization and RL systems... ...training stack. Core Technical Responsibilities LLM... ...PyTorch: LLM Inference engine development and integration... ...to shape decentralized AI and RL at Prime... ...development and encourage team members to contribute to the...
Suggested
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime Intellect
San Francisco, CA
4 days ago
Member of Technical Staff - ML Systems & Inference
...first heterogeneous neocloud for AI workloads. As AI systems scale,... ...Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design... ...scalable. This role is ideal for engineers who deeply understand how modern...
Suggested
Gimlet Labs, Inc.
San Francisco, CA
4 days ago
Member of Technical Staff, ML Infrastructure & Inference
...Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation... .... This opportunity is well suited to engineers who understand how modern models execute at...
Suggested
Acceler8 Talent
San Francisco, CA
5 days ago
Member of Technical Staff, Inference & Serving
$200k - $350k
...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially... .... We are the AI researchers and engineers behind such breakthrough AI...
Suggested
Immediate start
Flexible hours
Inception LLC
San Francisco, CA
5 days ago
Member of Technical Staff - Inference
$180k
...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit... ..., highly motivated, and focused on engineering excellence. This organization is for...
Suggested
Temporary work
Xai
San Francisco, CA
3 days ago
Founding AI Engineer / Member of Technical Staff YC - Startup
$125k - $200k
...Founding AI Engineer / Member of Technical Staff YC - Startup New York City or San Francisco Bay Area $125,000.00 - $200,000.00 (US Dollar) PS. - Please apply only if the location is suitable for you and you are willing to travel! Thank you! Job Overview...
Temporary work
Work at office
Butterfly Recruitment
San Francisco, CA
5 days ago
Member of Technical Staff (AI Infrastructure Engineer)
...We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on... ...Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-...
Perplexity
San Francisco, CA
4 days ago
Applied AI Engineer (Member of Technical Staff)
$100k - $300k
...About Cogent Security Cogent is an Applied AI Lab building the next generation of AI... ...are looking for talented, ambitious AI/ML Engineers who are excited to build in the Applied AI... ...Onboard, support and uplevel future team members Mentor and grow future junior team...
Cogent Security
San Francisco, CA
4 days ago
Member of Technical Staff - Applied AI Engineer
...Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied biological intelligence company. We build and deploy software and biological AI systems to safeguard humanity. Applied AI...
Full time
Work at office
Valthos
San Francisco, CA
4 days ago
Member of Technical Staff (AI Engineering)
...servicing with the industry's most advanced AI credit-servicing agents. We are backed by... ...Product Hunt), Charlie Songhurst (Board Member, Meta), and Michael Jones (Former Chair,... ...the United Nations, UChicago, and Oxford engineers and researchers. Our omnichannel agents...
Full time
Internship
Worldwide
Krew Research
San Francisco, CA
3 days ago
Member of Technical Staff (Inference)
...balancing, and parallelism , Worked on low-level optimizations for inference, such as GPU kernels and code generation , Worked on... ..., and low-precision numerics , Worked on large-scale inference engines or reinforcement learning frameworks , Worked on large-scale, high...
Xai
San Francisco, CA
5 days ago
Member of Technical Staff, Inference & RL Systems
$225k
...Our approach combines frontier‑scale pre‑training, domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you will design and operate the distributed systems that...
Relocation
Visa sponsorship
Magic Inc
San Francisco, CA
4 days ago
Member of Technical Staff - Edge Inference Engineer
...Overview About Liquid AI Spun out of MIT CSAIL, we build general... .... The Opportunity Our Edge Inference team compiles Liquid... ...will work directly with the technical lead on problems that require... ...Experience Embedded software engineering experience or work on resource...
Liquid AI
San Francisco, CA
5 days ago
Member of Technical Staff (AI Software Engineer, Multimodal)
$220k - $405k
...is hiring builders to join our Multimodal AI group, an industry-leading team defining... ...modalities we have yet to invent. As an engineer on the Multimodal AI team, you will work... ...products end-to-end, from problem definition to technical design, implementation, and launch. Hill...
Perplexity
San Francisco, CA
4 days ago
Member of Technical Staff (AI Software Engineer, Agents)
$200k - $300k
...Location San Francisco Employment Type Full time Department AI Compensation $200K–$300K•Offers Equity U.S. Benefits Full‑time U.S.... ...from the amounts listed above. Perplexity is seeking an energetic engineer to join our highly driven Comet Agents engineering team. The...
Full time
Flexible hours
Pantera Capital
San Francisco, CA
4 days ago
Applied AI Inference Engineer
...powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion... ...help build the platform engineers turn to to ship AI... ...aspects of product management, technical customer success, and pre... ...blog posts written by members of our Forward Deployed Engineering...
Work experience placement
Flexible hours
Baseten
San Francisco, CA
5 days ago
Member of Technical Staff (AI Software Engineer, Multimodal)
...Join Our Multimodal AI Team Perplexity is hiring builders to join our Multimodal... ...modalities we have yet to invent. As an engineer on the Multimodal AI team, you will work... ...products end-to-end, from problem definition to technical design, implementation, and launch....
Perplexity AI
San Francisco, CA
2 days ago
Member of Technical Staff - AI Research (Intern)
...heterogeneous neocloud for AI workloads. As AI systems scale... .... Gimlet Labs is seeking an Member of Staff focused on AI Research (... ...and experimenting with novel inference efficiency techniques such as... ...degree in computer science, engineering, or comparable area of study...
Internship
Gimlet Labs
San Francisco, CA
5 days ago
Member of Technical Staff, Applied Research
...Member of Technical Staff, Applied Research About Us At Fireworks, we’re building... ...the future of generative AI infrastructure. Our platform... ...fastest and most scalable inference in the industry. We’ve been... ...intersection of ML research, systems engineering, and customer‑facing...
SupportFinity
San Francisco, CA
5 days ago
Member of Technical Staff
...What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience.... ...Design and build the integration of ML inference, monitoring systems, LLM interactions... ...of experience in backend software engineering, with a focus on Python in well‑established...
Work experience placement
RST Recruitment
San Francisco, CA
5 days ago
Member of Technical Staff, FAR (Frontier AI & Robotics)
$150k
...robotics at Amazon's Frontier AI & Robotics team, where you'... ...robotic intelligence. As a Member of Technical Staff, you'll spearhead the... ...action models, efficient model inference, and video tokenization... ...contributions Collaborate with engineering teams to optimize and scale...
Local area
Amazon Science
San Francisco, CA
2 days ago
Member of Technical Staff (RL Research & Environments)
$200k
...Member of Technical Staff, RL Research & Environments Posted Feb 28, 2026 | Full-time | Advanced (5-10 yrs... ...domain-specific RL, ultra-long context, and inference-time compute to achieve this goal. About The Role As a Software Engineer on the RL Research & Environments team,...
Full time
Relocation
Visa sponsorship
Magic Inc
San Francisco, CA
2 days ago
Member of Technical Staff (Evals & Post-Training Product)
...Requirements 1 - 7 years of software engineering experience (We are hiring at... ...to developer tools or AI/ML repositories (Desirable) Inference & Hardware Knowledge: Interest... ...job involves We are seeking a Member of Technical Staff, Evals & Post‑Training Product...
Fireworks AI
San Francisco, CA
5 days ago
Member of Technical Staff, ML Systems
...Member of Technical Staff, ML Systems Mirendil Mirendil is a tech-first company... ...is to democratize frontier AI R&D across scientific disciplines... ...includes researchers and engineers from Anthropic, Google... ...Building and scaling training and inference infrastructure (potentially...
Mirendil
San Francisco, CA
5 days ago
Member of Technical Staff - Science, Frontier AI & Robotics (FAR)
...robotics at Amazon's Frontier AI & Robotics team, where you'... ...robotic intelligence. As a Member of Technical Staff, you'll be at the forefront... ...models, efficient model inference, video tokenization Design... ...working closely with robotics engineers to integrate your solutions...
Local area
Amazon Science
San Francisco, CA
1 day ago
Member of the Technical Staff- LLMs
$170k - $220k
...Member of Technical Staff – Infrastructure & LLMs Location: San Francisco, CA (... ...curious and technically strong engineer to join a lean, high-... ...team building next-generation inference infrastructure for LLMs. This... ..., GPU orchestration, or AI infra Strong technical curiosity...
Full time
Temporary work
Immediate start
Visa sponsorship
Work visa
Amadeus Search
San Francisco, CA
5 days ago
Member of Technical Staff - Post-Training
...Member of Technical Staff - Post‑Training Join to apply for the Member of Technical... ...role at Reflection AI . Our Mission Reflection’s... ...agents. Drive research and engineering initiatives that push the frontier... ...learning algorithms, and inference‑time scaling techniques. Collaborate...
Full time
Relocation package
Reflection AI
San Francisco, CA
4 days ago
Member of Technical Staff
...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for... ...As a founding member of the engineering team, you will impact the design... ...is revolutionizing the AI development landscape with... ...training/fine-tuning, and inference? You will also: Find opportunities...
Full time
Part time
Work at office
Work from home
Flexible hours
2 days per week
Pixeltable, Inc.
San Francisco, CA
4 days ago
Member of Technical Staff, Search
...enterprises who are building AI systems to power magical... ...is a team of researchers, engineers, designers, and more, who are... ...mission and shape the future! Member of Technical Staff, Search Why this role? We... ...team to ensure that inference is fast and stable. Collaborate...
Full time
Work at office
Remote work
Flexible hours
Cohere
San Francisco, CA
4 days ago
Member of Technical Staff
...building the best way to talk to AI and humans together — where AI... ...day, and everyone talks to users. Member of Technical Staff is the title we use for engineers who own hard problems end to end... ..., fine-tuning, evaluation, inference, or RAG at scale High-performance...
Shapes
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff (AI Inference Engineer). Be the first to apply!