Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Member of Technical Staff - Model Optimization and Inference

$250k - $350k

Nuance Labs, Inc.

About Nuance Labs

Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person.

We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved.
How Nuance Differentiates

Most conversational AI avatars today are hacks - a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2-5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack.

That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up.

About the Role

We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation - and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product.

We're looking for someone who specializes in taking trained models and squeezing every last millisecond out of them. You understand the full stack from model weights to serving infrastructure - quantization, KV cache optimization, kernel-level acceleration, batching strategies - and you know which lever to pull for which problem. You've worked with vLLM, SGLang, or similar frameworks and have opinions about where they fall short.

Our stack is more complex than a standard LLM deployment: we're serving a full-duplex multimodal system that must satisfy strict real-time latency constraints. There's a lot of unsolved optimization work here, and we need someone who finds that genuinely exciting.
What You'll Do
  • Own end-to-end inference optimization across our model stack - LLMs, audio models, and diffusion-based components
  • Implement and tune KV cache strategies for long-context conversations, including eviction policies, compression, and memory-efficient attention
  • Evaluate, deploy, and extend inference serving frameworks (vLLM, SGLang, TensorRT-LLM, etc.) for our specific workloads
  • Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks
  • Build internal tooling that makes optimization work faster and more rigorous - profiling viewers, end-to-end inference test harnesses, and other infrastructure that helps the team move quickly
  • Accelerate diffusion model inference - consistency models, step distillation, caching strategies, and custom kernel optimizations
  • Apply and develop quantization techniques (INT8, INT4, GPTQ, AWQ, and beyond) to reduce memory footprint and increase throughput without meaningfully degrading quality
  • Work closely with research and infrastructure to ensure new models ship with optimized serving from day one
What We're Looking For
  • Deep expertise in LLM inference optimization - you've worked on KV caching, memory layout, attention kernels, or batching strategies in a production or research context
  • Proficiency with inference serving frameworks - vLLM, SGLang, TensorRT-LLM, or similar - including the ability to go beyond default configurations and adapt them to non-standard use cases
  • Experience optimizing diffusion model inference (latency reduction, step distillation, caching, or kernel-level work)
  • Strong Python and PyTorch skills; comfort reading and writing CUDA or Triton kernels is a significant plus
  • A systematic approach to profiling and optimization - you measure first, then optimize
  • Familiarity with speculative decoding or other inference-time acceleration techniques
Bonus Points
  • Hands-on experience with post-training quantization (GPTQ, AWQ, or similar) and understanding of quality/performance tradeoffs
  • Familiarity with multimodal or streaming inference architectures
  • Experience deploying real-time AI systems with hard latency SLAs
  • Prior work at an AI lab, inference startup, or on a high-traffic model serving platform
  • Contributions to open-source inference frameworks
Compensation

$250,000 - $350,000 base salary, plus meaningful equity. We think long-term ownership matters and structure equity accordingly.

Logistics
  • Location: In-person in Seattle, 5 days a week - we believe in the compounding value of working shoulder-to-shoulder
  • Health: HSA plan with ~$2,000 in company contributions - about 2x what most big tech companies offer
  • PTO: 15 days + public holidays, and we close for a full week over the holidays
  • Lunch, beverages, and snacks: On us, every workday - the kind of thing that makes you actually look forward to the workday
  • Commuter benefits
  • 401K: In the works

Nuance Labs is an equal opportunity employer. We believe diverse teams build better AI.
Vacancy posted 15 hours ago
Similar jobs that could be interesting for youBased on the Member of Technical Staff - Model Optimization and Inference in Seattle, WA vacancy
  • $180k

     ...Member Of Technical Staff - Imagine Model Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create...  ...data curation, modeling, training, inference serving, and product integration,...  ...learning systems. Ability to deliver optimal end-to-end user experiences.... 
    Suggested
    Temporary work

    Xai

    Seattle, WA
    3 days ago
  • $202.16k - $368.22k

     ...easier way. The team has research groups dedicated to generative models for content creation, image generation, video synthesis,...  ...We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model training and... 
    Suggested
    Temporary work
    Local area

    ByteDance

    Seattle, WA
    20 hours ago
  • $180k

     ...Member Of Technical Staff - Model Training Austin, TX; New York, NY; Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly... 
    Suggested
    Temporary work

    Xai

    Seattle, WA
    2 days ago
  • $242k - $290k

     ...Model Optimization & Deployment Engineer The Perception team is pioneering the development of a multi-modality foundation model to drive...  ...models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge... 
    Suggested
    Temporary work
    Relocation package

    Zoox

    Seattle, WA
    2 days ago
  • $300k - $400k

     ...re developing foundation models designed for it from the...  ...'re looking for a deeply technical Member of Technical Staff to own RL and post-training...  ...reward modeling, policy optimization, evaluation, data feedback...  ...large-scale training or inference systems, including rollout... 
    Suggested
    Shift work

    Nuance Labs, Inc.

    Seattle, WA
    15 hours ago
  • $159.75k - $255.6k

    Sr. Full Stack Member of Technical Staff Seattle, Washington, United States Join...  ...the full stack, from data, models, and infrastructure to...  ...constrained devices. Architect and optimize full‑stack AI pipelines....  ...platforms for large‑scale inference and training. Strong... 
    Work at office

    Axon

    Seattle, WA
    4 days ago
  • Bright Vision Technologies is seeking a Model Serving Engineer to design and operate highly reliable inference platforms for large machine learning models. This remote...  ...-driven environment. Responsibilities include optimizing model performance, integrating with API... 
    Remote job
    Full time

    Bright Vision Technologies

    Bellevue, WA
    2 days ago
  • $180k

     ...be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: You will work on the most critical modeling challenges at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You... 
    Temporary work

    xAI

    Seattle, WA
    11 days ago
  • $139.5k - $258.1k

    Large Machine Learning Model Optimization Engineer Seattle, Washington, United States Software and Services Our team is an applied research...  ...High performance kernel implementation Distributed inference At Apple, base pay is one part of our total compensation package... 
    Relocation

    Apple Inc.

    Seattle, WA
    4 days ago
  • $70k - $300k

     ...Staff AI Software Engineer - Edge Model Optimization & Deployment FieldAI is transforming how robots interact with...  ...this role, you will own the edge inference stack end to end, profiling and...  ...behavior in the field. You will set technical direction, raise engineering... 

    Field AI

    Seattle, WA
    2 days ago
  • $79.2k - $178.1k

     ...the Oracle Cloud to provide the broadest, most comprehensive cloud in the industry. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major components of Oracle's Cloud Infrastructure. You should be both... 
    Temporary work
    Worldwide
    Flexible hours

    Oracle

    Seattle, WA
    4 days ago
  • $96.8k - $223.4k

     ...innovation and excellence. As a valued member of our software engineering division in...  ...experiences. Collaborate and lead technical discussions across multiple teams to...  ...design principles Data management: data modeling, data warehousing, data governance... 
    Temporary work
    Remote work
    Flexible hours

    Oracle

    Seattle, WA
    3 days ago
  • $300k - $400k

     ...problem, and we're developing foundation models designed for it from the ground up....  ...the Role We're looking for a deeply technical MTS to own distributed training infrastructure...  ...for long-running training jobs. Optimize large-scale training performance across... 

    Nuance Labs, Inc.

    Seattle, WA
    15 hours ago
  • $200k - $300k

     ...It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role Model...  ...production-level pipelines — quickly and without losing correctness Optimize pipeline throughput and efficiency at scale; identify and... 

    Nuance Labs, Inc.

    Seattle, WA
    15 hours ago
  • $139.5k - $258.1k

     ...LLM Machine Learning Research Engineer, Model Optimization & Algorithms Development, SIML The Apple Intelligence Model Optimization and Algorithms Development team brings innovative AI research into Apple products. We are looking for strong Machine Learning Applied... 
    Relocation

    Apple

    Seattle, WA
    4 days ago
  • Senior Software Developer OCI Compute is looking for strong Senior Software Developers with a strong cloud/distributed systems/microservices background to take on the challenge of engineering Compute Infrastructure solutions and build services for Large Scale Compute...
    Flexible hours

    Oracle

    Seattle, WA
    2 days ago
  • Join Oracle Cloud Infrastructure’s Compute team to design, build, and scale the next generation of bare-metal provisioning systems powering millions of servers worldwide. As a senior engineer, you will develop highly reliable and secure infrastructure, tackle complex distributed...
    Worldwide
    Flexible hours

    Ll Oefentherapie

    Seattle, WA
    4 days ago
  •  ...Seattle is seeking a skilled Support Analyst to provide exceptional support for members using their AI-powered solutions. The ideal candidate will have 3-5 years of experience in a technical role, with expertise in financial services technology. Responsibilities include... 

    Range

    Seattle, WA
    3 days ago
  •  ...differently. You do not accept the status-quo. You challenge the current model of the world and take leaps of faith to build it better for...  ...improvements to the Spice.ai OSS project. 30‑60 days - take technical and engineering ownership of an entire feature area. 60‑90... 

    Geek

    Bellevue, WA
    20 hours ago
  • $57 per hour

     ...reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. Conduct research on infrastructure and...  ...related to large‑scale systems, inference optimization, compilers, or performance optimization.... 
    Hourly pay
    Internship
    Local area

    ByteDance

    Seattle, WA
    4 days ago
  • $26 per hour

    Aston Carter is seeking a Supply Chain Analyst based in Seattle, WA. This contract position entails driving continuous improvement in processes while problem-solving and designing solutions for the supply chain network. Candidates should have a Bachelor's degree in Engineering...
    Contract work

    Aston Carter

    Seattle, WA
    2 days ago
  •  ...people’s face”. Foundation Model Services team, within Machine...  ...have a chance to work on optimizing billions of parameter language...  ...team to prototype and develop inference for cutting edge model architectures...  ...Computer Science or related technical field. Preferred... 

    Apple

    Seattle, WA
    2 days ago
  •  ...Python Infrastructure Engineer - Model Evaluation (AI Training) About the Role...  ...What You'll Do Design, build, and optimize high-performance Python systems supporting...  ...harnesses for ML models, integrating with inference frameworks Improve reliability, performance... 
    Hourly pay
    Ongoing contract
    Contract work
    Freelance
    Remote work
    Flexible hours

    Alignerr

    Seattle, WA
    3 days ago
  • $181.1k - $318.4k

     ...Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model Work Locations (2) Submit Resume Apple is where...  ...of large-scale training and inference jobs. This role spans...  ...engineering, and performance optimization. Responsibilities Design... 
    Relocation

    Apple

    Seattle, WA
    4 days ago
  • Apple Inc. in Seattle, Washington, is seeking an experienced Machine Learning Engineer to join the Foundation Model Services team. You will work closely with product teams to build solutions that launch models for millions of customers in real time. The ideal candidate... 

    Apple Inc.

    Seattle, WA
    20 hours ago
  • $33 per hour

     ...Responsibilities: Carry out research on various technical subjects as requested by IT Team...  ...incident tickets to appropriate team members. Skills Required: Excellent...  ...challenges with flexibility and optimism Education/Experience Required: 4+... 
    Contract work
    Work experience placement
    Work at office
    Local area
    Remote work
    Relocation

    Keylent Inc

    Seattle, WA
    4 days ago
  • $180k

    Job Description Job Description About xAI xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization ...
    Temporary work

    xAI

    Seattle, WA
    17 days ago
  • $68.4k - $90k

     ...repair, maintain, and upgrade PC hardware and equipment to ensure optimal workstation performance. Document incidents, problems,...  ...hardware. Work independently while being a collaborative team member committed to team success. Follow standard operating procedures... 
    Permanent employment
    Temporary work
    Work at office
    Remote work
    Flexible hours
    Weekend work

    Corient Capital Partners

    Seattle, WA
    2 days ago
  • $90k - $120k

     ...systems. This position is responsible for the maintenance, optimization, and technical support of office-wide BIM platforms. The BIM Strategist...  ...Autodesk to resolve software issues. Ensure that project models are set‑up and maintained according to firm standards. Maintain... 
    Work at office
    Local area
    Work from home
    Flexible hours

    Archinect

    Seattle, WA
    4 days ago
  • $60k

     ...homebuilder in Bellevue, Washington, seeks a New Home Advisor to enhance the customer experience and optimize home sales. The role involves guiding prospective buyers through model home tours, managing post-purchase communication, and ensuring seamless sales processes. Key... 

    Tri Pointe Homes Holdings, Inc.

    Bellevue, WA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Member of Technical Staff - Model Optimization and Inference. Be the first to apply!