Member of Technical Staff - Model Optimization and Inference

$250k - $350k

Nuance Labs, Inc.

About Nuance Labs

Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person.

We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved.
How Nuance Differentiates

Most conversational AI avatars today are hacks - a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2-5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack.

That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up.

About the Role

We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation - and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product.

We're looking for someone who specializes in taking trained models and squeezing every last millisecond out of them. You understand the full stack from model weights to serving infrastructure - quantization, KV cache optimization, kernel-level acceleration, batching strategies - and you know which lever to pull for which problem. You've worked with vLLM, SGLang, or similar frameworks and have opinions about where they fall short.

Our stack is more complex than a standard LLM deployment: we're serving a full-duplex multimodal system that must satisfy strict real-time latency constraints. There's a lot of unsolved optimization work here, and we need someone who finds that genuinely exciting.
What You'll Do

Own end-to-end inference optimization across our model stack - LLMs, audio models, and diffusion-based components
Implement and tune KV cache strategies for long-context conversations, including eviction policies, compression, and memory-efficient attention
Evaluate, deploy, and extend inference serving frameworks (vLLM, SGLang, TensorRT-LLM, etc.) for our specific workloads
Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks
Build internal tooling that makes optimization work faster and more rigorous - profiling viewers, end-to-end inference test harnesses, and other infrastructure that helps the team move quickly
Accelerate diffusion model inference - consistency models, step distillation, caching strategies, and custom kernel optimizations
Apply and develop quantization techniques (INT8, INT4, GPTQ, AWQ, and beyond) to reduce memory footprint and increase throughput without meaningfully degrading quality
Work closely with research and infrastructure to ensure new models ship with optimized serving from day one

What We're Looking For

Deep expertise in LLM inference optimization - you've worked on KV caching, memory layout, attention kernels, or batching strategies in a production or research context
Proficiency with inference serving frameworks - vLLM, SGLang, TensorRT-LLM, or similar - including the ability to go beyond default configurations and adapt them to non-standard use cases
Experience optimizing diffusion model inference (latency reduction, step distillation, caching, or kernel-level work)
Strong Python and PyTorch skills; comfort reading and writing CUDA or Triton kernels is a significant plus
A systematic approach to profiling and optimization - you measure first, then optimize
Familiarity with speculative decoding or other inference-time acceleration techniques

Bonus Points

Hands-on experience with post-training quantization (GPTQ, AWQ, or similar) and understanding of quality/performance tradeoffs
Familiarity with multimodal or streaming inference architectures
Experience deploying real-time AI systems with hard latency SLAs
Prior work at an AI lab, inference startup, or on a high-traffic model serving platform
Contributions to open-source inference frameworks

Compensation

$250,000 - $350,000 base salary, plus meaningful equity. We think long-term ownership matters and structure equity accordingly.

Logistics

Location: In-person in Seattle, 5 days a week - we believe in the compounding value of working shoulder-to-shoulder
Health: HSA plan with ~$2,000 in company contributions - about 2x what most big tech companies offer
PTO: 15 days + public holidays, and we close for a full week over the holidays
Lunch, beverages, and snacks: On us, every workday - the kind of thing that makes you actually look forward to the workday
Commuter benefits
401K: In the works

Nuance Labs is an equal opportunity employer. We believe diverse teams build better AI.

Apply

Vacancy posted 15 hours ago

Similar jobs that could be interesting for youBased on the Member of Technical Staff - Model Optimization and Inference in Seattle, WA vacancy

Member of Technical Staff - Imagine Model
$180k
...Member Of Technical Staff - Imagine Model Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create... ...data curation, modeling, training, inference serving, and product integration,... ...learning systems. Ability to deliver optimal end-to-end user experiences....
Suggested
Temporary work
Xai
Seattle, WA
3 days ago
Sr. Multimodal Model Training and Inference Optimization Engineer
$202.16k - $368.22k
...easier way. The team has research groups dedicated to generative models for content creation, image generation, video synthesis,... ...We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model training and...
Suggested
Temporary work
Local area
ByteDance
Seattle, WA
20 hours ago
Member of Technical Staff - Model Training
$180k
...Member Of Technical Staff - Model Training Austin, TX; New York, NY; Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly...
Suggested
Temporary work
Xai
Seattle, WA
2 days ago
Senior AI Inference Engineer - Model Optimization & Deployment
$242k - $290k
...Model Optimization & Deployment Engineer The Perception team is pioneering the development of a multi-modality foundation model to drive... ...models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge...
Suggested
Temporary work
Relocation package
Zoox
Seattle, WA
2 days ago
Member of Technical Staff - RL Research
$300k - $400k
...re developing foundation models designed for it from the... ...'re looking for a deeply technical Member of Technical Staff to own RL and post-training... ...reward modeling, policy optimization, evaluation, data feedback... ...large-scale training or inference systems, including rollout...
Suggested
Shift work
Nuance Labs, Inc.
Seattle, WA
15 hours ago
Sr. Full Stack Member of Technical Staff
$159.75k - $255.6k
Sr. Full Stack Member of Technical Staff Seattle, Washington, United States Join... ...the full stack, from data, models, and infrastructure to... ...constrained devices. Architect and optimize full‑stack AI pipelines.... ...platforms for large‑scale inference and training. Strong...
Work at office
Axon
Seattle, WA
4 days ago
Remote Model Serving Engineer - High-Performance ML Inference
Bright Vision Technologies is seeking a Model Serving Engineer to design and operate highly reliable inference platforms for large machine learning models. This remote... ...-driven environment. Responsibilities include optimizing model performance, integrating with API...
Remote job
Full time
Bright Vision Technologies
Bellevue, WA
2 days ago
Member of Technical Staff - Model Training
$180k
...be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: You will work on the most critical modeling challenges at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You...
Temporary work
xAI
Seattle, WA
11 days ago
Large Machine Learning Model Optimization Engineer, SIML
$139.5k - $258.1k
Large Machine Learning Model Optimization Engineer Seattle, Washington, United States Software and Services Our team is an applied research... ...High performance kernel implementation Distributed inference At Apple, base pay is one part of our total compensation package...
Relocation
Apple Inc.
Seattle, WA
4 days ago
Staff AI Software Engineer, Edge Model Optimization & Deployment
$70k - $300k
...Staff AI Software Engineer - Edge Model Optimization & Deployment FieldAI is transforming how robots interact with... ...this role, you will own the edge inference stack end to end, profiling and... ...behavior in the field. You will set technical direction, raise engineering...
Field AI
Seattle, WA
2 days ago
Senior Member Technical Staff (JoinOCI-SDE)
$79.2k - $178.1k
...the Oracle Cloud to provide the broadest, most comprehensive cloud in the industry. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major components of Oracle's Cloud Infrastructure. You should be both...
Temporary work
Worldwide
Flexible hours
Oracle
Seattle, WA
4 days ago
Principal Member Technical Staff
$96.8k - $223.4k
...innovation and excellence. As a valued member of our software engineering division in... ...experiences. Collaborate and lead technical discussions across multiple teams to... ...design principles Data management: data modeling, data warehousing, data governance...
Temporary work
Remote work
Flexible hours
Oracle
Seattle, WA
3 days ago
Member of Technical Staff - Pretraining Infra
$300k - $400k
...problem, and we're developing foundation models designed for it from the ground up.... ...the Role We're looking for a deeply technical MTS to own distributed training infrastructure... ...for long-running training jobs. Optimize large-scale training performance across...
Nuance Labs, Inc.
Seattle, WA
15 hours ago
Member of Technical Staff - ML Data Infra
$200k - $300k
...It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role Model... ...production-level pipelines — quickly and without losing correctness Optimize pipeline throughput and efficiency at scale; identify and...
Nuance Labs, Inc.
Seattle, WA
15 hours ago
LLM Machine Learning Research Engineer, Model Optimization & Algorithms Development, SIML
$139.5k - $258.1k
...LLM Machine Learning Research Engineer, Model Optimization & Algorithms Development, SIML The Apple Intelligence Model Optimization and Algorithms Development team brings innovative AI research into Apple products. We are looking for strong Machine Learning Applied...
Relocation
Apple
Seattle, WA
4 days ago
Principal Member of Technical Staff
Senior Software Developer OCI Compute is looking for strong Senior Software Developers with a strong cloud/distributed systems/microservices background to take on the challenge of engineering Compute Infrastructure solutions and build services for Large Scale Compute...
Flexible hours
Oracle
Seattle, WA
2 days ago
Senior Member Technical Staff (JoinOCI-SDE)
Join Oracle Cloud Infrastructure’s Compute team to design, build, and scale the next generation of bare-metal provisioning systems powering millions of servers worldwide. As a senior engineer, you will develop highly reliable and secure infrastructure, tackle complex distributed...
Worldwide
Flexible hours
Ll Oefentherapie
Seattle, WA
4 days ago
FinTech Support Analyst - Drive Member Success
...Seattle is seeking a skilled Support Analyst to provide exceptional support for members using their AI-powered solutions. The ideal candidate will have 3-5 years of experience in a technical role, with expertise in financial services technology. Responsibilities include...
Range
Seattle, WA
3 days ago
Member of Technical Staff (Rust, Search & Database Engines)
...differently. You do not accept the status-quo. You challenge the current model of the world and take leaps of faith to build it better for... ...improvements to the Spice.ai OSS project. 30‑60 days - take technical and engineering ownership of an entire feature area. 60‑90...
Geek
Bellevue, WA
20 hours ago
Student Researcher (AI Foundation Model Infrastructure - Seed) - 2027 Start (PhD)
$57 per hour
...reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. Conduct research on infrastructure and... ...related to large‑scale systems, inference optimization, compilers, or performance optimization....
Hourly pay
Internship
Local area
ByteDance
Seattle, WA
4 days ago
Supply Chain Capacity Analyst: Optimize & Model with SQL
$26 per hour
Aston Carter is seeking a Supply Chain Analyst based in Seattle, WA. This contract position entails driving continuous improvement in processes while problem-solving and designing solutions for the supply chain network. Candidates should have a Bachelor's degree in Engineering...
Contract work
Aston Carter
Seattle, WA
2 days ago
Machine Learning Engineer, Foundation Model Services
...people’s face”. Foundation Model Services team, within Machine... ...have a chance to work on optimizing billions of parameter language... ...team to prototype and develop inference for cutting edge model architectures... ...Computer Science or related technical field. Preferred...
Apple
Seattle, WA
2 days ago
Python Insfrastructure Engineer - Model Evaluation
...Python Infrastructure Engineer - Model Evaluation (AI Training) About the Role... ...What You'll Do Design, build, and optimize high-performance Python systems supporting... ...harnesses for ML models, integrating with inference frameworks Improve reliability, performance...
Hourly pay
Ongoing contract
Contract work
Freelance
Remote work
Flexible hours
Alignerr
Seattle, WA
3 days ago
Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model
$181.1k - $318.4k
...Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model Work Locations (2) Submit Resume Apple is where... ...of large-scale training and inference jobs. This role spans... ...engineering, and performance optimization. Responsibilities Design...
Relocation
Apple
Seattle, WA
4 days ago
Foundation Model Services ML Engineer — Real-Time Inference
Apple Inc. in Seattle, Washington, is seeking an experienced Machine Learning Engineer to join the Foundation Model Services team. You will work closely with product teams to build solutions that launch models for millions of customers in real time. The ideal candidate...
Apple Inc.
Seattle, WA
20 hours ago
IT Support Technician
$33 per hour
...Responsibilities: Carry out research on various technical subjects as requested by IT Team... ...incident tickets to appropriate team members. Skills Required: Excellent... ...challenges with flexibility and optimism Education/Experience Required: 4+...
Contract work
Work experience placement
Work at office
Local area
Remote work
Relocation
Keylent Inc
Seattle, WA
4 days ago
Member of Technical Staff - Media
$180k
Job Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization ...
Temporary work
xAI
Seattle, WA
17 days ago
Deskside Support Engineer
$68.4k - $90k
...repair, maintain, and upgrade PC hardware and equipment to ensure optimal workstation performance. Document incidents, problems,... ...hardware. Work independently while being a collaborative team member committed to team success. Follow standard operating procedures...
Permanent employment
Temporary work
Work at office
Remote work
Flexible hours
Weekend work
Corient Capital Partners
Seattle, WA
2 days ago
Building Information Model (BIM) Strategist
$90k - $120k
...systems. This position is responsible for the maintenance, optimization, and technical support of office-wide BIM platforms. The BIM Strategist... ...Autodesk to resolve software issues. Ensure that project models are set‑up and maintained according to firm standards. Maintain...
Work at office
Local area
Work from home
Flexible hours
Archinect
Seattle, WA
4 days ago
New Home Sales Advisor — Model Tours & Closings
$60k
...homebuilder in Bellevue, Washington, seeks a New Home Advisor to enhance the customer experience and optimize home sales. The role involves guiding prospective buyers through model home tours, managing post-purchase communication, and ensuring seamless sales processes. Key...
Tri Pointe Homes Holdings, Inc.
Bellevue, WA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Model Optimization and Inference. Be the first to apply!