Member of Technical Staff (AI Inference Engineer)
Perplexity AI
Inference Engine Engineer
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us.
What You Will Work On
Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway.
Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow.
Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic.
Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving.
Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents.
Who We're Looking For
Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus.
You understand modern LLM architectures and are able to bring them up reliably in a production environment.
You've built and operated production distributed systems under real load - ideally performance-critical ones.
Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels.
You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday.
Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you.
Good If You Touched Any Of
ML compilers and framework internals: PyTorch internals, torch.compile, custom operators.
Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism.
Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving.
Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis.
Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads.
Qualifications
3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems.
Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow).
Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores).
Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation).
$100k - $300k
...AI Cyber Taskforce Engineer Cogent is an Applied AI Lab building the next generation of AI agents for cybersecurity. AI has fundamentally changed... ...at Cogent Onboard, support and uplevel future team members Mentor and grow future junior team members...Suggested$100k - $300k
...innovating at the frontier of generative AI systems. We are building the world’s first... ...industries. Founded by seasoned former engineering and product leaders from companies such as... ...Onboard, support and uplevel future team members Mentor and grow future junior team members...Suggested- Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied biological intelligence company. We build and deploy software and biological AI systems to safeguard humanity. Applied...SuggestedFull timeWork at office
- ...Opportunity At Vibecode At Vibecode, we want to bring the power of AI to the next 100 million people. If you get deeply excited... .... Since the advent of AI-assisted coding, there is no hard technical requirements anymore, but here are some interesting things we are...SuggestedNight shiftEarly shift
- ...Technical Intern Opportunity Adaptive ML is a frontier AI startup building a Reinforcement... ...Our Technical Staff develops the foundational... ...combining strong engineering with careful... ...and iterate GPU inference kernels in Triton... ...Nearly all members of our Technical...SuggestedInternshipLive inWork at office
- ...Adaptive ML is a frontier AI startup building a... ...announced soon. Our Technical Staff develops the... ...please apply! As a Member of Technical Staff, you... ...combining large-scale engineering with rigorous empirical... ...Profile and iterate GPU inference kernels in Triton or CUDA...Live inWork at officeRelocationVisa sponsorship
- ...security-first enterprise AI company. We build cutting-edge... ...is a team of researchers, engineers, designers, and more, who... ...or "ML Engineer" role. As a Member of Technical Staff, Applied ML, you will:... ...and distributed training or inference pipelines. Understanding...Full timeWork at officeLocal areaRemote workHome officeFlexible hours
- ...Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered... .... This is a hands-on role for an engineer who thrives at the intersection of... ...AI pipelines from ingestion to inference — reliable, maintainable, and cost-efficient...Full timeFlexible hours
$175k - $220k
...Member Of Technical Staff, Cloud Infrastructure At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest... ...and most scalable inference in the industry. We've... ...Role: As a Software Engineer on our Cloud Infrastructure...- ...Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered... .... This is a hands‑on role for an engineer who thrives at the intersection of... ...from data ingestion through inference, ensuring reliability, scalability,...Full timeFlexible hours
- ...building a proactive AI system that understands... ...the guidance of senior engineers. This role is for... ...training, evaluation, and inference. Fine-tune and adapt... ...product excellence. All members are expected to be... ...are evaluated by our technical team members. Interviews...
$250k - $350k
Member of Technical Staff - Quantitative Research New York City (Remote possible... ...Udio builds extraordinary AI experiences to empower musical... ...intersection of research, engineering and product, bridging... ...pretraining, post-training and inference systems as applicable. Drive...Work experience placementRemote workFlexible hours- Overview Member of Technical Staff — AI/ML Engineering (Financial Technology) Build intelligent systems that redefine how businesses manage financial operations... ...that support data ingestion, model training, inference, and monitoring while ensuring high availability and...Permanent employmentFull timeContract workFlexible hours
- Role Overview Member of Technical Staff — Voice & Audio AI Systems Build intelligent voice experiences that transform... ...environments. This is a hands-on engineering role for someone who enjoys... ...audio ingestion, streaming inference, orchestration, and monitoring, ensuring...Permanent employmentFull timeContract workFlexible hours
$197.3k - $225.1k
...Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking... ...cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered...Full timePart timeLocal area- ...transitioning models from research to production environments. Integrate AI features into existing products, enriching them with the... ...is a bonus. Strong experience with Llama.cpp and ggml inference engines, facilitating the deployment of models to specific GPU architectures...Remote job
$185.1k - $335.3k
A leading automotive company is seeking a Staff Compiler Engineer to enhance the model compilation stack for autonomous... .... This role involves optimizing high-level AI models into inference artifacts, defining technical visions, and collaborating with engineering teams...$197.3k - $225.1k
Lead AI Engineer (FM Hosting, LLM Inference) Overview At Capital One, we are creating responsible and reliable AI systems, changing banking for good.... ...cross‑functional team of engineers, research scientists, technical program managers, and product managers to deliver AI‑...Local area- ...agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind,... ...beyond. Role Overview Reflection AI is looking for a Member of Technical Staff - IT Engineer. In this role, you'll be expected to manage a broad...Work at officeRelocation package
- A technology company in Georgia is seeking a C++ Engineer to own the inference backbone of its AI stack, focusing on deploying models to edge devices. You'll collaborate closely with researchers and manage a cross-functional team to enhance existing products with AI features...
- ...Autonomous Technologies Group) is an AI lab deploying frontier... ...We're looking for a software engineer to help develop, launch, and maintain... ...AI experiences. As a key member of our team, you'll push the boundaries... ...of iOS development, combining technical excellence with design finesse...Shift work
- Senior AI Engineer — Inference & Agent Systems United States Title: Applied AI Engineer — Inference & Agent Systems Location: United States What We're Building Arcana is building AI agents that synthesize information across heterogeneous sources and deliver structured...
- About the Job We are looking for an experienced AI Model Engineer with deep expertise in kernel development, model optimization, fine‑tuning, and GPU acceleration. The engineer will extend the inference framework to support inference and fine‑tuning for language models...Remote job
- ...Modal Infrastructure Engineer AI needs a new infrastructure layer. We're building it at Modal. Every era of computing brought new... ...starts, and native storage, so it's simple to serve low-latency inference, fine-tune models, and access production-ready sandboxes at...Work at office
- Member of Technical Staff - Software Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Software Engineer Valthos Inc. Valthos is... ...intelligence company. We build and deploy software and biological AI systems to safeguard humanity. The same AI...Full timeWork at office
- ...About Us: AI needs a new infrastructure layer. We're building it at Modal.... ...storage, so it's simple to serve low-latency inference, fine-tune models, and access production... ...international olympiad medalists, and experienced engineering and product leaders with decades of...
- ...focus on delivering care. We've built an AI-powered platform designed by... ...follow. The Team At Anterior, engineers share a strong "sense of product" and... ...your application. About the Role Members of Technical Staff at Anterior own problems end-to-end -...ApprenticeshipFlexible hours
$142.8k - $274.8k
...Overview Core AI is at the forefront of Microsoft’s mission to reinvent how... ...AI that scale. We are looking for a Member of Technical Staff who is truly AI‑native—someone who experiments... ...multimodal models), including prompt engineering, evaluation, or fine‑tuning. Hands‑...Ongoing contractLocal area$160k - $320k
...efficient, motivated, and focused on engineering excellence. We cultivate individuals that... ...Role We're seeking a remarkable Member of Technical Staff to join our team to design a central intelligent... ...design engineering, particularly with AI interfaces Previous work in open-...Work at office- ...About Us: AI needs a new infrastructure layer. We're building it at Modal.... ...storage, so it's simple to serve low-latency inference, fine-tune models, and access production... ...international olympiad medalists, and experienced engineering and product leaders with decades of...Work at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff (AI Inference Engineer). Be the first to apply!
- IT assistant New York, NY
- desktop support analyst New York, NY
- senior IT support technician New York, NY
- personal computer support technician New York, NY
- technical analyst New York, NY
- customer support technician New York, NY
- tech assistant New York, NY
- technical support assistant New York, NY
- product support analyst New York, NY
- customer support analyst New York, NY

