Member of Technical Staff - Model Optimization and Inference (Experienced)

$250k - $350k

Nuance Labs, Inc.

About Nuance Labs

Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person.

We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved.
How Nuance Differentiates

Most conversational AI avatars today are hacks - a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2-5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack.

That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up.

About the Role

We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation - and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product.

We're looking for someone who specializes in taking trained models and squeezing every last millisecond out of them. You understand the full stack from model weights to serving infrastructure - quantization, KV cache optimization, kernel-level acceleration, batching strategies - and you know which lever to pull for which problem. You've worked with vLLM, SGLang, or similar frameworks at scale and have strong opinions about where they fall short.

This posting is aimed at experienced engineers and researchers who've operated at a senior to senior-staff level at big tech, a leading AI lab, or a high-traffic inference team. Everyone at Nuance is MTS - we don't run title ladders - but we're hiring people who have already done this work at scale.

Our stack is more complex than a standard LLM deployment: we're serving a full-duplex multimodal system that must satisfy strict real-time latency constraints. There's a lot of unsolved optimization work here, and we need someone who finds that genuinely exciting.
What You'll Do

Own end-to-end inference optimization across our model stack - LLMs, audio models, and diffusion-based components
Implement and tune KV cache strategies for long-context conversations, including eviction policies, compression, and memory-efficient attention
Evaluate, deploy, and extend inference serving frameworks (vLLM, SGLang, TensorRT-LLM, etc.) for our specific workloads
Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks
Build internal tooling that makes optimization work faster and more rigorous - profiling viewers, end-to-end inference test harnesses, and other infrastructure that helps the team move quickly
Accelerate diffusion model inference - consistency models, step distillation, caching strategies, and custom kernel optimizations
Apply and develop quantization techniques (INT8, INT4, GPTQ, AWQ, and beyond) to reduce memory footprint and increase throughput without meaningfully degrading quality
Work closely with research and infrastructure to ensure new models ship with optimized serving from day one

What We're Looking For

Significant hands-on experience with LLM inference optimization - you've shipped work on KV caching, memory layout, attention kernels, or batching strategies in a production or high-traffic research context
Proven proficiency with inference serving frameworks - vLLM, SGLang, TensorRT-LLM, or similar - including going well beyond default configurations and adapting them to non-standard workloads
Experience optimizing diffusion model inference (latency reduction, step distillation, caching, or kernel-level work)
Strong Python and PyTorch skills; comfort reading and writing CUDA or Triton kernels is a significant plus
A systematic approach to profiling and optimization - you measure first, then optimize
Familiarity with speculative decoding or other inference-time acceleration techniques

Bonus Points

Hands-on experience with post-training quantization (GPTQ, AWQ, or similar) and a clear sense of quality/performance tradeoffs
Familiarity with multimodal or streaming inference architectures
Experience deploying real-time AI systems with hard latency SLAs
Prior work at an AI lab, inference startup, or on a high-traffic model serving platform
Contributions to open-source inference frameworks

Compensation

$250,000 - $350,000 base salary, plus meaningful equity. We think long-term ownership matters and structure equity accordingly.

Logistics

Location: In-person in Seattle, five days a week - we believe in the compounding value of working shoulder-to-shoulder.
Visa sponsorship: We sponsor visas (O-1, H-1B, green card) from day one.
AI-native tooling: Do your best work with the best tools, including unlimited tokens.

Benefits

Health: HSA plan with ~$2,000 in annual company contributions - roughly 2x what most big tech companies put in.
Time off: 15 days of PTO plus public holidays, and we close the office for a full week at year-end.
Food: Lunch, drinks, and snacks on us every workday - the small thing that quietly makes the day better.
Commuter benefits: We help cover the cost of getting to the office.
401(k): In the works.

Nuance Labs is an equal opportunity employer. We believe diverse teams build better AI.

Apply

Vacancy posted 14 hours ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff - Model Optimization and Inference (Experienced) in Seattle, WA vacancy

Member of Technical Staff — Model Optimization and Inference (New Grad)
$200k - $300k
...problem, and we're developing foundation models designed for it from the ground up.... ...infrastructure: quantization, KV cache optimization, kernel-level acceleration, batching strategies... ...You’ll Do Contribute to end-to-end inference optimization across our model stack — LLMs...
Suggested
Full time
Internship
H1b
Work at office
Visa sponsorship
Nuance Labs
Seattle, WA
2 days ago
Member of Technical Staff - RL Research (Experienced)
$300k
...developing foundation models designed for it... ...looking for a deeply technical Member of Technical Staff to own RL and post-... ...is aimed at experienced researchers and engineers... ...reward modeling, policy optimization, evaluation, data... ...-scale training or inference systems, including...
Suggested
H1b
Work at office
Visa sponsorship
Shift work
Nuance Labs, Inc.
Seattle, WA
15 hours ago
Member of Technical Staff - Imagine Model
$180k
...Member Of Technical Staff - Imagine Model Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create... ...data curation, modeling, training, inference serving, and product integration,... ...learning systems. Ability to deliver optimal end-to-end user experiences....
Suggested
Temporary work
Xai
Seattle, WA
3 days ago
Sr. Multimodal Model Training and Inference Optimization Engineer
$232.56k - $427.5k
...team has research groups dedicated to generative models for content creation, image generation, video synthesis... ...editing, and virtual humans. We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model...
Suggested
Temporary work
Local area
Tik Tok
Seattle, WA
1 day ago
Member of Technical Staff - Model Training
$180k
...Member Of Technical Staff - Model Training Austin, TX; New York, NY; Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly...
Suggested
Temporary work
Xai
Seattle, WA
3 days ago
Senior AI Inference Engineer - Model Optimization & Deployment
$242k - $290k
...Model Optimization & Deployment Engineer The Perception team is pioneering the development of a multi-modality foundation model to drive... ...models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge...
Temporary work
Relocation package
Zoox
Seattle, WA
3 days ago
Member of Technical Staff - Pretraining Infra (Experienced)
$300k - $400k
...problem, and we're developing foundation models designed for it from the ground up.... ...the Role We're looking for a deeply technical MTS to own distributed training infrastructure... ...for long-running training jobs. Optimize large-scale training performance across...
H1b
Work at office
Visa sponsorship
Nuance Labs, Inc.
Seattle, WA
14 hours ago
Member of Technical Staff, FAR (Frontier AI & Robotics)
$150k
...Member Of Technical Staff Join the next revolution in robotics at Amazon's Frontier... ...of breakthrough foundation models that enable robots to... ...action models, efficient model inference, video tokenization... ...with engineering teams to optimize and scale models for real-world...
Full time
Temporary work
Seasonal work
Local area
Amazon Technologies, Inc.
Seattle, WA
4 days ago
Sr. Full Stack Member of Technical Staff
$159.75k - $255.6k
Sr. Full Stack Member of Technical Staff Seattle, Washington, United States Join... ...the full stack, from data, models, and infrastructure to... ...constrained devices. Architect and optimize full‑stack AI pipelines.... ...platforms for large‑scale inference and training. Strong...
Work at office
Axon
Seattle, WA
19 hours ago
Member of Technical Staff - Model Training
$180k
...be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: You will work on the most critical modeling challenges at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You...
Temporary work
xAI
Seattle, WA
18 days ago
Member of Technical Staff - Research Fellow
$200k - $250k
...'re developing foundation models designed for it from the ground... ..., generative modeling, or inference. Depending on your... ...to convert to a full-time Member of Technical Staff role. Fellows who convert... ...appetite to pick up anything and optimize the hell out of it BONUS...
Full time
Temporary work
Internship
H1b
Work at office
Visa sponsorship
Nuance Labs
Seattle, WA
2 days ago
Remote Model Serving Engineer - High-Performance ML Inference
Bright Vision Technologies is seeking a Model Serving Engineer to design and operate highly reliable inference platforms for large machine learning models. This remote... ...-driven environment. Responsibilities include optimizing model performance, integrating with API...
Remote job
Full time
Bright Vision Technologies
Bellevue, WA
3 days ago
Member of Technical Staff (AI-Powered EdTech)
$120k - $150k
# Member of Technical Staff (AI-Powered EdTech)Colleague AI$120K - $1600KKirkland, WA, USSeniorAI/ML EngineerInterested... ...**. We fully integrate **the best AI models and tools** into our **product design... ...and cloud technologies.* Build and optimize **AI\-driven features** for...
Permanent employment
Full time
Flexible hours
AI Pulse
Kirkland, WA
3 days ago
Member of Technical Staff — ML Data Infra
$200k - $300k
Member of Technical Staff — ML Data Infra Seattle, Washington About Nuance Labs Nuance Labs is building... ..., and we're developing foundation models designed for it from the ground up. About... ...and without losing correctness Optimize pipeline throughput and efficiency at...
Nuance Labs
Seattle, WA
3 days ago
Large Machine Learning Model Optimization Engineer, SIML
$139.5k - $258.1k
Large Machine Learning Model Optimization Engineer Seattle, Washington, United States Software and Services Our team is an applied research... ...High performance kernel implementation Distributed inference At Apple, base pay is one part of our total compensation package...
Relocation
Apple Inc.
Seattle, WA
19 hours ago
Senior Member Technical Staff (JoinOCI-SDE)
$79.2k - $178.1k
...the Oracle Cloud to provide the broadest, most comprehensive cloud in the industry. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major components of Oracle's Cloud Infrastructure. You should be both...
Temporary work
Worldwide
Flexible hours
Oracle
Seattle, WA
19 hours ago
Principal Member Technical Staff
$96.8k - $223.4k
...innovation and excellence. As a valued member of our software engineering division in... ...experiences. Collaborate and lead technical discussions across multiple teams to... ...design principles Data management: data modeling, data warehousing, data governance...
Temporary work
Remote work
Flexible hours
Oracle
Seattle, WA
4 days ago
Senior Member Technical Staff (JoinOCI-SDE)
...systems challenges, and help deliver the foundation for OCI’s most performant compute services. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major components of Oracle’s Cloud Infrastructure. You should be a...
Temporary work
Worldwide
Flexible hours
Oracle
Seattle, WA
3 days ago
Member of Technical Staff, Mobile Engineer
...art and science. We believe that world models are at the frontier of progress in artificial... ...state management, and performance optimization on resource‑constrained devices.... ...considering candidates who may be more or less experienced than outlined in the job description....
Remote work
runwayml.com
Seattle, WA
10 hours ago
Python Insfrastructure Engineer - Model Evaluation
...Python Infrastructure Engineer - Model Evaluation (AI Training)... ...ll Do Design, build, and optimize high-performance Python systems... ...ML models, integrating with inference frameworks Improve... ...production-grade Python ~ Experienced building evaluation harnesses...
Hourly pay
Ongoing contract
Contract work
Freelance
Remote work
Flexible hours
Alignerr
Seattle, WA
4 days ago
Foundation Model Services ML Engineer — Real-Time Inference
Apple Inc. in Seattle, Washington, is seeking an experienced Machine Learning Engineer to join the Foundation Model Services team. You will work closely with product teams to build solutions that launch models for millions of customers in real time. The ideal candidate...
Apple Inc.
Seattle, WA
1 day ago
Principal Member of Technical Staff
Senior Software Developer OCI Compute is looking for strong Senior Software Developers with a strong cloud/distributed systems/microservices background to take on the challenge of engineering Compute Infrastructure solutions and build services for Large Scale Compute...
Flexible hours
Oracle
Seattle, WA
3 days ago
Student Researcher (AI Foundation Model Infrastructure - Seed) - 2027 Start (PhD)
$57 per hour
...reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. Conduct research on infrastructure and... ...related to large‑scale systems, inference optimization, compilers, or performance optimization....
Hourly pay
Internship
Local area
ByteDance
Seattle, WA
19 hours ago
Supply Chain Capacity Analyst: Optimize & Model with SQL
$26 per hour
Aston Carter is seeking a Supply Chain Analyst based in Seattle, WA. This contract position entails driving continuous improvement in processes while problem-solving and designing solutions for the supply chain network. Candidates should have a Bachelor's degree in Engineering...
Contract work
Aston Carter
Seattle, WA
3 days ago
Senior Member of Technical Staff
$79.2k - $209.5k
...growing fast, still at an early stage, and working on ambitious new initiatives. An engineer at any level can have a significant technical and business impact. The ideal software engineer candidate for this team is a proficient programmer who has large breadth of knowledge...
Temporary work
Local area
Flexible hours
Oracle
Seattle, WA
10 hours ago
FinTech Support Analyst - Drive Member Success
...Seattle is seeking a skilled Support Analyst to provide exceptional support for members using their AI-powered solutions. The ideal candidate will have 3-5 years of experience in a technical role, with expertise in financial services technology. Responsibilities include...
Range
Seattle, WA
4 days ago
Member of Technical Staff (Rust, Search & Database Engines)
...differently. You do not accept the status-quo. You challenge the current model of the world and take leaps of faith to build it better for... ...improvements to the Spice.ai OSS project. 30‑60 days - take technical and engineering ownership of an entire feature area. 60‑90...
Geek
Bellevue, WA
1 day ago
Senior Member Technical Staff (JoinOCI-SDE)
Join Oracle Cloud Infrastructure’s Compute team to design, build, and scale the next generation of bare-metal provisioning systems powering millions of servers worldwide. As a senior engineer, you will develop highly reliable and secure infrastructure, tackle complex distributed...
Worldwide
Flexible hours
Ll Oefentherapie
Seattle, WA
19 hours ago
Member of Technical Staff - Media
$180k
Job Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization ...
Temporary work
xAI
Seattle, WA
22 days ago
Research Engineer, Production Model Post-Training
$350k
...Role Anthropic's production models undergo sophisticated post-training... ...Implement and optimize post‑training techniques at scale... ...Policy Currently, we expect all staff to be in one of our offices at... ...underrepresented groups are more prone to experiencing imposter syndrome and...
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
Seattle, WA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Model Optimization and Inference (Experienced). Be the first to apply!