Member of Technical Staff (AI Inference Engineer)
Perplexity AI
Inference Engine Engineer
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us.
What You Will Work On
Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway.
Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow.
Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic.
Profile and fix bottlenecks from network ingress through continuous batching and GPU kernel interleaving.
Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents.
Who We're Looking For
Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus.
You understand modern LLM architectures and are able to bring them up reliably in a production environment.
You've built and operated production distributed systems under real load - ideally performance-critical ones.
Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels.
You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday.
Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you.
Good If You Touched Any Of
ML compilers and framework internals: PyTorch internals, torch.compile, custom operators.
Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism.
Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving.
Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis.
Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads.
Qualifications
3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems.
Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow).
Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores).
Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation).
$200k - $350k
...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's fastest reasoning LLM and first commercially... .... We are the AI researchers and engineers behind such breakthrough AI...SuggestedImmediate startFlexible hours$180k
...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit... ..., highly motivated, and focused on engineering excellence. This organization is for...SuggestedTemporary work- ...building the next generation of AI infrastructure: large-scale AI datacenters... ...Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design... .... This role is ideal for engineers who deeply understand how modern...Suggested
- Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation... .... This opportunity is well suited to engineers who understand how modern models execute...Suggested
$150k - $300k
...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri... ...cloud LLM serving, LLM inference optimization and RL systems... ...training stack. Core Technical Responsibilities LLM Serving... ...PyTorch: LLM Inference engine development and... ...development and encourage team members to contribute to the...SuggestedWork at officeRemote workVisa sponsorshipRelocation packageFlexible hoursShift work$125k - $200k
...Founding AI Engineer / Member of Technical Staff YC - Startup New York City or San Francisco Bay Area $125,000.00 - $200,000.00 (US Dollar) PS. - Please apply only if the location is suitable for you and you are willing to travel! Thank you! Job Overview...Temporary workWork at office$100k - $300k
...Cogent Security Cogent is an Applied AI Lab building the next generation of AI... ...are looking for talented, ambitious AI/ML Engineers who are excited to build in the Applied AI... ...Onboard, support and uplevel future team members Mentor and grow future junior team members...$150k - $250k
...servicing with the industry’s most advanced AI credit-servicing agents. We are backed... ...Product Hunt), Charlie Songhurst (Board Member, Meta), and Michael Jones (Former Chair,... ...the United Nations, UChicago, and Oxford engineers and researchers. Our omnichannel...Full timeInternshipWorldwide- ...AI Infra Engineer We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch... ...Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize our large-...
- Member of Technical Staff - Applied AI Engineer Valthos | Posted Mar 3 Full-time Negotiable Advanced (5-10 yrs) Valthos Inc. Valthos is an applied biological intelligence company. We build and deploy software and biological AI systems to safeguard humanity. Applied...Full timeWork at office
- Member of Technical Staff: AI Research & Engineering in Media Integrity About Synhawk Synhawk builds omnimodal foundation models for communication integrity, aimed at infrastructure-side deployment in telco and banking sectors. Our platform analyzes the integrity of audio...Immediate startShift work
- Overview About Liquid AI Spun out of MIT CSAIL, we build general... .... The Opportunity Our Edge Inference team compiles Liquid... ...will work directly with the technical lead on problems that require... ...Experience Embedded software engineering experience or work on resource...
- ...powers mission‑critical inference for the world's most dynamic AI companies, like Cursor, Notion... ...help build the platform engineers turn to to ship AI... ...aspects of product management, technical customer success, and pre... ...blog posts written by members of our Forward Deployed Engineering...Work experience placementFlexible hours
- ...Join Our Multimodal AI Team Perplexity is hiring builders to join our Multimodal... ...modalities we have yet to invent. As an engineer on the Multimodal AI team, you will work... ...products end-to-end, from problem definition to technical design, implementation, and launch....
- ...is hiring builders to join our Multimodal AI group, an industry-leading team defining... ...modalities we have yet to invent. As an engineer on the Multimodal AI team, you will work... ...products end‑to‑end, from problem definition to technical design, implementation, and launch. Hill...
- Perplexity is seeking energetic engineers to join our highly driven Agents engineering team. The Agents team consists of AI/ML, backend, and full-stack engineers who collaborate to build delightful agentic experiences within our Comet ecosystem and Perplexity Computer,...Flexible hours
- ...building the next generation of AI infrastructure: large-scale... ...Gimlet Labs is seeking an Member of Technical Staff focused on AI research.... ...and experimenting with novel inference efficiency techniques such... ...in computer science, engineering, applied mathematics or comparable...
$256k - $276k
...at Postman. The Opportunity As a Member of Technical Staff on AI Infrastructure, you will build and maintain... ...infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering and research teams to ensure performance,...Work at officeFlexible hours3 days per week$150k - $280k
...Member of Technical Staff (Backend) San Francisco, CA Compensation: $150,... ...for banks and fintechs using AI agents that function like... ...growth and is expanding its engineering team to accelerate development... ...: - Distributed inference - Caching - Queue orchestration...Full timeTemporary workH1bWork at officeVisa sponsorshipRelocation package$150k - $300k
...Chief Scientist, Together AI), Dylan Patel (... ...the jobs. Core Technical Responsibilities Hosted... ...-based training and inference orchestration across... ...We're looking for engineers who are fluent across... ...development and encourage team members to contribute to the...Work at officeLocal areaRemote workVisa sponsorshipRelocation packageFlexible hours$200k
...RL, ultra-long context, and inference-time compute to achieve... ...important decisions. As a Member of Technical Staff on Evals, you will build both... ...about helping researchers and engineers make better decisions... ...collaborative team working on frontier AI systems Magic strives to...Visa sponsorshipRelocation package- ...Member Of Technical Staff We're looking for a member of technical staff to build... ...deploy production-grade AI systems. In this role, you'... ...scalable pipelines for training, inference, and data processing... ...Master's in computer science, engineering, or related field Strong...
- What we are looking for? Seeking a Member of Technical Staff - Backend with 5+ years of experience.... ...Design and build the integration of ML inference, monitoring systems, LLM interactions... ...of experience in backend software engineering, with a focus on Python in well‑established...Work experience placement
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid... ...and technically strong engineer to join a lean, high-... ...team building next-generation inference infrastructure for LLMs. This... ...systems, GPU orchestration, or AI infra Strong technical...Full timeTemporary workImmediate startVisa sponsorshipWork visa- Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full time Apply for... ...As a founding member of the engineering team, you will impact the design... ...is revolutionizing the AI development landscape with... ...training/fine-tuning, and inference? You will also: Find opportunities...Full timePart timeWork at officeWork from homeFlexible hours2 days per week
- Member of Technical Staff, Applied AI The opportunity We are looking for a Member of Technical Staff with deep... ...team of machine learners, protein engineers and biologists, jointly working to... ..., training dynamics and inference behaviour. You are a skilful ML developer...Flexible hours
- ...Page One. The Role We’re hiring a Member of Technical Staff - AI/ML to design, build, and deploy AI-powered... .... This is a hands‑on role for an engineer who thrives at the intersection of... ...from data ingestion through inference, ensuring reliability, scalability,...Full timeFlexible hours
$300k
Member of Technical Staff - RL Infrastructure About V max V max is an applied research lab developing AI capable of open-ended learning. We are building... ...for strong infrastructure engineers who can build the... ...training orchestration, inference, evals, data pipelines,...Work at officeLocal area- Member of Technical Staff - Post‑Training Join to apply for the Member of Technical... ...role at Reflection AI . Our Mission Reflection’s... ...agents. Drive research and engineering initiatives that push the frontier... ...learning algorithms, and inference‑time scaling techniques....Full timeRelocation package
- ...enterprises who are building AI systems to power magical... ...is a team of researchers, engineers, designers, and more, who are... ...mission and shape the future! Member of Technical Staff, Search Why this role? We... ...team to ensure that inference is fast and stable. Collaborate...Full timeWork at officeRemote workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff (AI Inference Engineer). Be the first to apply!
- IT assistant San Francisco, CA
- desktop support analyst San Francisco, CA
- senior IT support technician San Francisco, CA
- personal computer support technician San Francisco, CA
- technical analyst San Francisco, CA
- customer support technician San Francisco, CA
- tech assistant San Francisco, CA
- technical support assistant San Francisco, CA
- customer support analyst San Francisco, CA
- remote (work from home) technical support representative San Francisco, CA


