Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff - Model Optimization and Inference (Experienced)

$250k - $350k

Nuance Labs, Inc.

About Nuance Labs

Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person.

We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved.
How Nuance Differentiates

Most conversational AI avatars today are hacks - a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2-5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack.

That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up.

About the Role

We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation - and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product.

We're looking for someone who specializes in taking trained models and squeezing every last millisecond out of them. You understand the full stack from model weights to serving infrastructure - quantization, KV cache optimization, kernel-level acceleration, batching strategies - and you know which lever to pull for which problem. You've worked with vLLM, SGLang, or similar frameworks at scale and have strong opinions about where they fall short.

This posting is aimed at experienced engineers and researchers who've operated at a senior to senior-staff level at big tech, a leading AI lab, or a high-traffic inference team. Everyone at Nuance is MTS - we don't run title ladders - but we're hiring people who have already done this work at scale.

Our stack is more complex than a standard LLM deployment: we're serving a full-duplex multimodal system that must satisfy strict real-time latency constraints. There's a lot of unsolved optimization work here, and we need someone who finds that genuinely exciting.
What You'll Do
  • Own end-to-end inference optimization across our model stack - LLMs, audio models, and diffusion-based components
  • Implement and tune KV cache strategies for long-context conversations, including eviction policies, compression, and memory-efficient attention
  • Evaluate, deploy, and extend inference serving frameworks (vLLM, SGLang, TensorRT-LLM, etc.) for our specific workloads
  • Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks
  • Build internal tooling that makes optimization work faster and more rigorous - profiling viewers, end-to-end inference test harnesses, and other infrastructure that helps the team move quickly
  • Accelerate diffusion model inference - consistency models, step distillation, caching strategies, and custom kernel optimizations
  • Apply and develop quantization techniques (INT8, INT4, GPTQ, AWQ, and beyond) to reduce memory footprint and increase throughput without meaningfully degrading quality
  • Work closely with research and infrastructure to ensure new models ship with optimized serving from day one
What We're Looking For
  • Significant hands-on experience with LLM inference optimization - you've shipped work on KV caching, memory layout, attention kernels, or batching strategies in a production or high-traffic research context
  • Proven proficiency with inference serving frameworks - vLLM, SGLang, TensorRT-LLM, or similar - including going well beyond default configurations and adapting them to non-standard workloads
  • Experience optimizing diffusion model inference (latency reduction, step distillation, caching, or kernel-level work)
  • Strong Python and PyTorch skills; comfort reading and writing CUDA or Triton kernels is a significant plus
  • A systematic approach to profiling and optimization - you measure first, then optimize
  • Familiarity with speculative decoding or other inference-time acceleration techniques
Bonus Points
  • Hands-on experience with post-training quantization (GPTQ, AWQ, or similar) and a clear sense of quality/performance tradeoffs
  • Familiarity with multimodal or streaming inference architectures
  • Experience deploying real-time AI systems with hard latency SLAs
  • Prior work at an AI lab, inference startup, or on a high-traffic model serving platform
  • Contributions to open-source inference frameworks
Compensation

$250,000 - $350,000 base salary, plus meaningful equity. We think long-term ownership matters and structure equity accordingly.

Logistics
  • Location: In-person in Seattle, five days a week - we believe in the compounding value of working shoulder-to-shoulder.
  • Visa sponsorship: We sponsor visas (O-1, H-1B, green card) from day one.
  • AI-native tooling: Do your best work with the best tools, including unlimited tokens.
Benefits
  • Health: HSA plan with ~$2,000 in annual company contributions - roughly 2x what most big tech companies put in.
  • Time off: 15 days of PTO plus public holidays, and we close the office for a full week at year-end.
  • Food: Lunch, drinks, and snacks on us every workday - the small thing that quietly makes the day better.
  • Commuter benefits: We help cover the cost of getting to the office.
  • 401(k): In the works.

Nuance Labs is an equal opportunity employer. We believe diverse teams build better AI.
Vacancy posted 14 hours ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - Model Optimization and Inference (Experienced) in Seattle, WA vacancy
  • $200k - $300k

     ...problem, and we're developing foundation models designed for it from the ground up....  ...infrastructure: quantization, KV cache optimization, kernel-level acceleration, batching strategies...  ...You’ll Do Contribute to end-to-end inference optimization across our model stack — LLMs... 
    Suggested
    Full time
    Internship
    H1b
    Work at office
    Visa sponsorship

    Nuance Labs

    Seattle, WA
    2 days ago
  • $300k

     ...developing foundation models designed for it...  ...looking for a deeply technical Member of Technical Staff to own RL and post-...  ...is aimed at experienced researchers and engineers...  ...reward modeling, policy optimization, evaluation, data...  ...-scale training or inference systems, including... 
    Suggested
    H1b
    Work at office
    Visa sponsorship
    Shift work

    Nuance Labs, Inc.

    Seattle, WA
    15 hours ago
  • $180k

     ...Member Of Technical Staff - Imagine Model Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create...  ...data curation, modeling, training, inference serving, and product integration,...  ...learning systems. Ability to deliver optimal end-to-end user experiences.... 
    Suggested
    Temporary work

    Xai

    Seattle, WA
    3 days ago
  • $232.56k - $427.5k

     ...team has research groups dedicated to generative models for content creation, image generation, video synthesis...  ...editing, and virtual humans. We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model... 
    Suggested
    Temporary work
    Local area

    Tik Tok

    Seattle, WA
    1 day ago
  • $180k

     ...Member Of Technical Staff - Model Training Austin, TX; New York, NY; Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly... 
    Suggested
    Temporary work

    Xai

    Seattle, WA
    3 days ago
  • $242k - $290k

     ...Model Optimization & Deployment Engineer The Perception team is pioneering the development of a multi-modality foundation model to drive...  ...models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge... 
    Temporary work
    Relocation package

    Zoox

    Seattle, WA
    3 days ago
  • $300k - $400k

     ...problem, and we're developing foundation models designed for it from the ground up....  ...the Role We're looking for a deeply technical MTS to own distributed training infrastructure...  ...for long-running training jobs. Optimize large-scale training performance across... 
    H1b
    Work at office
    Visa sponsorship

    Nuance Labs, Inc.

    Seattle, WA
    14 hours ago
  • $150k

     ...Member Of Technical Staff Join the next revolution in robotics at Amazon's Frontier...  ...of breakthrough foundation models that enable robots to...  ...action models, efficient model inference, video tokenization...  ...with engineering teams to optimize and scale models for real-world... 
    Full time
    Temporary work
    Seasonal work
    Local area

    Amazon Technologies, Inc.

    Seattle, WA
    4 days ago
  • $159.75k - $255.6k

    Sr. Full Stack Member of Technical Staff Seattle, Washington, United States Join...  ...the full stack, from data, models, and infrastructure to...  ...constrained devices. Architect and optimize full‑stack AI pipelines....  ...platforms for large‑scale inference and training. Strong... 
    Work at office

    Axon

    Seattle, WA
    19 hours ago
  • $180k

     ...be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: You will work on the most critical modeling challenges at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You... 
    Temporary work

    xAI

    Seattle, WA
    18 days ago
  • $200k - $250k

     ...'re developing foundation models designed for it from the ground...  ..., generative modeling, or inference. Depending on your...  ...to convert to a full-time Member of Technical Staff role. Fellows who convert...  ...appetite to pick up anything and optimize the hell out of it BONUS... 
    Full time
    Temporary work
    Internship
    H1b
    Work at office
    Visa sponsorship

    Nuance Labs

    Seattle, WA
    2 days ago
  • Bright Vision Technologies is seeking a Model Serving Engineer to design and operate highly reliable inference platforms for large machine learning models. This remote...  ...-driven environment. Responsibilities include optimizing model performance, integrating with API... 
    Remote job
    Full time

    Bright Vision Technologies

    Bellevue, WA
    3 days ago
  • $120k - $150k

    # Member of Technical Staff (AI-Powered EdTech)Colleague AI$120K - $1600KKirkland, WA, USSeniorAI/ML EngineerInterested...  ...**. We fully integrate **the best AI models and tools** into our **product design...  ...and cloud technologies.* Build and optimize **AI\-driven features** for... 
    Permanent employment
    Full time
    Flexible hours

    AI Pulse

    Kirkland, WA
    3 days ago
  • $200k - $300k

    Member of Technical Staff — ML Data Infra Seattle, Washington About Nuance Labs Nuance Labs is building...  ..., and we're developing foundation models designed for it from the ground up. About...  ...and without losing correctness Optimize pipeline throughput and efficiency at... 

    Nuance Labs

    Seattle, WA
    3 days ago
  • $139.5k - $258.1k

    Large Machine Learning Model Optimization Engineer Seattle, Washington, United States Software and Services Our team is an applied research...  ...High performance kernel implementation Distributed inference At Apple, base pay is one part of our total compensation package... 
    Relocation

    Apple Inc.

    Seattle, WA
    19 hours ago
  • $79.2k - $178.1k

     ...the Oracle Cloud to provide the broadest, most comprehensive cloud in the industry. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major components of Oracle's Cloud Infrastructure. You should be both... 
    Temporary work
    Worldwide
    Flexible hours

    Oracle

    Seattle, WA
    19 hours ago
  • $96.8k - $223.4k

     ...innovation and excellence. As a valued member of our software engineering division in...  ...experiences. Collaborate and lead technical discussions across multiple teams to...  ...design principles Data management: data modeling, data warehousing, data governance... 
    Temporary work
    Remote work
    Flexible hours

    Oracle

    Seattle, WA
    4 days ago
  •  ...systems challenges, and help deliver the foundation for OCI’s most performant compute services. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major components of Oracle’s Cloud Infrastructure. You should be a... 
    Temporary work
    Worldwide
    Flexible hours

    Oracle

    Seattle, WA
    3 days ago
  •  ...art and science. We believe that world models are at the frontier of progress in artificial...  ...state management, and performance optimization on resource‑constrained devices....  ...considering candidates who may be more or less experienced than outlined in the job description.... 
    Remote work

    runwayml.com

    Seattle, WA
    10 hours ago
  •  ...Python Infrastructure Engineer - Model Evaluation (AI Training)...  ...ll Do Design, build, and optimize high-performance Python systems...  ...ML models, integrating with inference frameworks Improve...  ...production-grade Python ~ Experienced building evaluation harnesses... 
    Hourly pay
    Ongoing contract
    Contract work
    Freelance
    Remote work
    Flexible hours

    Alignerr

    Seattle, WA
    4 days ago
  • Apple Inc. in Seattle, Washington, is seeking an experienced Machine Learning Engineer to join the Foundation Model Services team. You will work closely with product teams to build solutions that launch models for millions of customers in real time. The ideal candidate... 

    Apple Inc.

    Seattle, WA
    1 day ago
  • Senior Software Developer OCI Compute is looking for strong Senior Software Developers with a strong cloud/distributed systems/microservices background to take on the challenge of engineering Compute Infrastructure solutions and build services for Large Scale Compute...
    Flexible hours

    Oracle

    Seattle, WA
    3 days ago
  • $57 per hour

     ...reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. Conduct research on infrastructure and...  ...related to large‑scale systems, inference optimization, compilers, or performance optimization.... 
    Hourly pay
    Internship
    Local area

    ByteDance

    Seattle, WA
    19 hours ago
  • $26 per hour

    Aston Carter is seeking a Supply Chain Analyst based in Seattle, WA. This contract position entails driving continuous improvement in processes while problem-solving and designing solutions for the supply chain network. Candidates should have a Bachelor's degree in Engineering...
    Contract work

    Aston Carter

    Seattle, WA
    3 days ago
  • $79.2k - $209.5k

     ...growing fast, still at an early stage, and working on ambitious new initiatives. An engineer at any level can have a significant technical and business impact. The ideal software engineer candidate for this team is a proficient programmer who has large breadth of knowledge... 
    Temporary work
    Local area
    Flexible hours

    Oracle

    Seattle, WA
    10 hours ago
  •  ...Seattle is seeking a skilled Support Analyst to provide exceptional support for members using their AI-powered solutions. The ideal candidate will have 3-5 years of experience in a technical role, with expertise in financial services technology. Responsibilities include... 

    Range

    Seattle, WA
    4 days ago
  •  ...differently. You do not accept the status-quo. You challenge the current model of the world and take leaps of faith to build it better for...  ...improvements to the Spice.ai OSS project. 30‑60 days - take technical and engineering ownership of an entire feature area. 60‑90... 

    Geek

    Bellevue, WA
    1 day ago
  • Join Oracle Cloud Infrastructure’s Compute team to design, build, and scale the next generation of bare-metal provisioning systems powering millions of servers worldwide. As a senior engineer, you will develop highly reliable and secure infrastructure, tackle complex distributed...
    Worldwide
    Flexible hours

    Ll Oefentherapie

    Seattle, WA
    19 hours ago
  • $180k

    Job Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization ...
    Temporary work

    xAI

    Seattle, WA
    22 days ago
  • $350k

     ...Role Anthropic's production models undergo sophisticated post-training...  ...Implement and optimize post‑training techniques at scale...  ...Policy Currently, we expect all staff to be in one of our offices at...  ...underrepresented groups are more prone to experiencing imposter syndrome and... 
    Work at office
    Visa sponsorship
    Flexible hours

    Menlo Ventures

    Seattle, WA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Model Optimization and Inference (Experienced). Be the first to apply!