Member of Technical Staff (AI Inference Engineer)

Perplexity AI

Inference Engine Engineer

We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us.

What You Will Work On

Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway.
Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow.
Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic.
Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving.
Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents.

Who We're Looking For

Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus.
You understand modern LLM architectures and are able to bring them up reliably in a production environment.
You've built and operated production distributed systems under real load - ideally performance-critical ones.
Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels.
You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday.
Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you.

Good If You Touched Any Of

ML compilers and framework internals: PyTorch internals, torch.compile, custom operators.
Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism.
Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving.
Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis.
Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads.

Qualifications

3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems.
Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow).
Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores).
Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation).

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff (AI Inference Engineer) in New York, NY vacancy

Applied AI Engineer (Member of Technical Staff)
$100k - $300k
...AI Cyber Taskforce Engineer Cogent is an Applied AI Lab building the next generation of AI agents for cybersecurity. AI has fundamentally changed... ...at Cogent Onboard, support and uplevel future team members Mentor and grow future junior team members...
Suggested
Cogent Security, Inc.
New York, NY
4 days ago
Applied AI Engineer (Member of Technical Staff)
$100k - $300k
...innovating at the frontier of generative AI systems. We are building the world’s first... ...industries. Founded by seasoned former engineering and product leaders from companies such as... ...Onboard, support and uplevel future team members Mentor and grow future junior team members...
Suggested
Cogent Security
New York, NY
3 days ago
Member of Technical Staff - Applied AI Engineer
Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied biological intelligence company. We build and deploy software and biological AI systems to safeguard humanity. Applied...
Suggested
Full time
Work at office
Valthos
New York, NY
5 days ago
Member of Technical Staff (Applied AI Engineer)
...Opportunity At Vibecode At Vibecode, we want to bring the power of AI to the next 100 million people. If you get deeply excited... .... Since the advent of AI-assisted coding, there is no hard technical requirements anymore, but here are some interesting things we are...
Suggested
Night shift
Early shift
VibeCode
New York, NY
1 day ago
Member of Technical Staff (intern)
...Technical Intern Opportunity Adaptive ML is a frontier AI startup building a Reinforcement... ...Our Technical Staff develops the foundational... ...combining strong engineering with careful... ...and iterate GPU inference kernels in Triton... ...Nearly all members of our Technical...
Suggested
Internship
Live in
Work at office
Adaptive ML
New York, NY
3 days ago
Member of Technical Staff (Open Role)
...Adaptive ML is a frontier AI startup building a... ...announced soon. Our Technical Staff develops the... ...please apply! As a Member of Technical Staff, you... ...combining large-scale engineering with rigorous empirical... ...Profile and iterate GPU inference kernels in Triton or CUDA...
Live in
Work at office
Relocation
Visa sponsorship
Adaptive ML
New York, NY
1 day ago
Member of Technical Staff, MLE
...security-first enterprise AI company. We build cutting-edge... ...is a team of researchers, engineers, designers, and more, who... ...or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will:... ...and distributed training or inference pipelines. Understanding...
Full time
Work at office
Local area
Remote work
Home office
Flexible hours
Cohere
New York, NY
4 days ago
Member of the Technical Staff - AI/ML
...Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered... .... This is a hands-on role for an engineer who thrives at the intersection of... ...AI pipelines from ingestion to inference — reliable, maintainable, and cost-efficient...
Full time
Flexible hours
Stuut
New York, NY
3 days ago
Member of Technical Staff, Cloud Infrastructure
$175k - $220k
...Member Of Technical Staff, Cloud Infrastructure At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest... ...and most scalable inference in the industry. We've... ...Role: As a Software Engineer on our Cloud Infrastructure...
Fireworks AI
New York, NY
2 days ago
Member of the Technical Staff — AI/ML
...Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered... .... This is a hands‑on role for an engineer who thrives at the intersection of... ...from data ingestion through inference, ensuring reliability, scalability,...
Full time
Flexible hours
Stuut
New York, NY
2 days ago
Member of Technical Staff, Machine Learning
...building a proactive AI system that understands... ...the guidance of senior engineers. This role is for... ...training, evaluation, and inference. Fine-tune and adapt... ...product excellence. All members are expected to be... ...are evaluated by our technical team members. Interviews...
Bjak
New York, NY
5 days ago
Member of Technical Staff - Quantitative Research
$250k - $350k
Member of Technical Staff - Quantitative Research New York City (Remote possible... ...Udio builds extraordinary AI experiences to empower musical... ...intersection of research, engineering and product, bridging... ...pretraining, post-training and inference systems as applicable. Drive...
Work experience placement
Remote work
Flexible hours
Udio
New York, NY
5 days ago
Member of Technical Staff - AI/ML - Autonomous Finance AI Platform
Overview Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations... ...that support data ingestion, model training, inference, and monitoring while ensuring high availability and...
Permanent employment
Full time
Contract work
Flexible hours
Curb
New York, NY
2 days ago
Member of Technical Staff - Audio & Voice - Autonomous Finance AI Platform
Role Overview Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform... ...environments. This is a hands-on engineering role for someone who enjoys... ...audio ingestion, streaming inference, orchestration, and monitoring, ensuring...
Permanent employment
Full time
Contract work
Flexible hours
Andiamo
New York, NY
3 days ago
Lead AI Engineer (FM Hosting, LLM Inference)
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking... ...cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered...
Full time
Part time
Local area
Capital One Financial Corp
New York, NY
4 days ago
Senior AI Inference Engineer 100% Remote
...transitioning models from research to production environments. Integrate AI features into existing products, enriching them with the... ...is a bonus. Strong experience with Llama.cpp and ggml inference engines, facilitating the deployment of models to specific GPU architectures...
Remote job
Framework Ventures
New York, NY
4 days ago
Staff AI Kernel & Compiler Engineer (On-Vehicle Inference)
$185.1k - $335.3k
A leading automotive company is seeking a Staff Compiler Engineer to enhance the model compilation stack for autonomous... .... This role involves optimizing high-level AI models into inference artifacts, defining technical visions, and collaborating with engineering teams...
General Motors
New York, NY
9 hours ago
Lead AI Engineer (FM Hosting, LLM Inference)
$197.3k - $225.1k
Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for good.... ...cross‑functional team of engineers, research scientists, technical program managers, and product managers to deliver AI‑...
Local area
Capital One National Association
New York, NY
1 day ago
Member of Technical Staff - IT Engineering
...agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind,... ...beyond. Role Overview Reflection AI is looking for a Member of Technical Staff - IT Engineer. In this role, you'll be expected to manage a broad...
Work at office
Relocation package
Reflection AI, Inc
New York, NY
3 days ago
Edge AI Inference Engineer — On-Device ML Systems
A technology company in Georgia is seeking a C++ Engineer to own the inference backbone of its AI stack, focusing on deploying models to edge devices. You'll collaborate closely with researchers and manage a cross-functional team to enhance existing products with AI features...
Framework Ventures
New York, NY
2 days ago
Member of Technical Staff (iOS)
...Autonomous Technologies Group) is an AI lab deploying frontier... ...We're looking for a software engineer to help develop, launch, and maintain... ...AI experiences. As a key member of our team, you'll push the boundaries... ...of iOS development, combining technical excellence with design finesse...
Shift work
ATG intelligence
New York, NY
4 days ago
Senior AI Engineer — Inference & Agent Systems
Senior AI Engineer — Inference & Agent Systems United States Title: Applied AI Engineer — Inference & Agent Systems Location: United States What We're Building Arcana is building AI agents that synthesize information across heterogeneous sources and deliver structured...
Arcana Analytics Inc.
New York, NY
4 days ago
Senior AI Research Engineer Model Inference Remote
About the Job We are looking for an experienced AI Model Engineer with deep expertise in kernel development, model optimization, fine‑tuning, and GPU acceleration. The engineer will extend the inference framework to support inference and fine‑tuning for language models...
Remote job
Framework Ventures
New York, NY
5 days ago
Member of Technical Staff - Python SDK
...Modal Infrastructure Engineer AI needs a new infrastructure layer. We're building it at Modal. Every era of computing brought new... ...starts, and native storage, so it's simple to serve low-latency inference, fine-tune models, and access production-ready sandboxes at...
Work at office
Modal
New York, NY
5 days ago
Member of Technical Staff - Software Engineer
Member of Technical Staff - Software Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Software Engineer Valthos Inc. Valthos is... ...intelligence company. We build and deploy software and biological AI systems to safeguard humanity. The same AI...
Full time
Work at office
Valthos
New York, NY
2 days ago
Member of Technical Staff - ML Performance
...About Us: AI needs a new infrastructure layer. We're building it at Modal.... ...storage, so it's simple to serve low-latency inference, fine-tune models, and access production... ...international olympiad medalists, and experienced engineering and product leaders with decades of...
Modal
New York, NY
2 days ago
Member of Technical Staff
...focus on delivering care. We've built an AI-powered platform designed by... ...follow. The Team At Anterior, engineers share a strong "sense of product" and... ...your application. About the Role Members of Technical Staff at Anterior own problems end-to-end -...
Apprenticeship
Flexible hours
Anterior
New York, NY
1 day ago
Member of Technical Staff - Responsible AI (CoreAI)
$142.8k - $274.8k
...Overview Core AI is at the forefront of Microsoft’s mission to reinvent how... ...AI that scale. We are looking for a Member of Technical Staff who is truly AI‑native—someone who experiments... ...multimodal models), including prompt engineering, evaluation, or fine‑tuning. Hands‑...
Ongoing contract
Local area
Microsoft Corporation
New York, NY
3 days ago
Member of Technical Staff - Human-Computer Interaction
$160k - $320k
...efficient, motivated, and focused on engineering excellence. We cultivate individuals that... ...Role We're seeking a remarkable Member of Technical Staff to join our team to design a central intelligent... ...design engineering, particularly with AI interfaces Previous work in open-...
Work at office
Liquid
New York, NY
1 day ago
Member of Technical Staff - Product (Backend)
...About Us: AI needs a new infrastructure layer. We're building it at Modal.... ...storage, so it's simple to serve low-latency inference, fine-tune models, and access production... ...international olympiad medalists, and experienced engineering and product leaders with decades of...
Work at office
Modal
New York, NY
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff (AI Inference Engineer). Be the first to apply!