Member of Technical Staff — Model Optimization and Inference (New Grad)
$200k - $300kNuance Labs
About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full‑duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. About The Role We can train a great model, but the next problem is making it fast enough to actually use in a real‑time conversation. A model that responds in 3 seconds is a demo; a model that responds in under 500 ms is a product. We’re looking for someone who’s excited about taking trained models and squeezing every last millisecond out of them. You understand—or want to deeply understand—the full stack from model weights to serving infrastructure: quantization, KV cache optimization, kernel‑level acceleration, and batching strategies. You’ve worked with vLLM, SGLang, or similar frameworks (through coursework, research, internships, or open‑source) and have opinions about where they fall short. This posting is aimed at early‑career engineers finishing or recently finished with a BS, MS, or PhD. We don’t require a PhD – we care about systems intuition, engineering chops, and the appetite to go deep. What You’ll Do Contribute to end‑to‑end inference optimization across our model stack—LLMs, audio models, and diffusion‑based components Implement and tune KV cache strategies for long‑context conversations, including eviction policies, compression, and memory‑efficient attention Work with inference serving frameworks (vLLM, SGLang, TensorRT‑LLM, etc.) and extend them for our specific workloads Profile and benchmark end‑to‑end latency and throughput; identify and systematically eliminate bottlenecks Build internal tooling that makes optimization work faster and more rigorous—profiling viewers, end‑to‑end inference test harnesses, and other infrastructure that helps the team move quickly Accelerate diffusion model inference—consistency models, step distillation, caching strategies, and custom kernel optimizations Apply quantization techniques (INT8, INT4, GPTQ, AWQ, and beyond) to reduce memory footprint and increase throughput without meaningfully degrading quality Work closely with research and infrastructure to ensure new models ship with optimized serving from day one What We’re Looking For BS, MS, or PhD in CS, ML, or a related field—completed or in the final stretch Strong fundamentals in LLM inference or ML systems—KV caching, memory layout, attention kernels, batching, or serving—picked up through coursework, research, internships, or open‑source. You don’t need to have shipped at production scale yet; you do need to learn fast and go deep. Exposure to inference serving frameworks (vLLM, SGLang, TensorRT‑LLM, or similar)—even at a research or hobby level Strong Python and PyTorch skills; familiarity with CUDA or Triton is a significant plus A systematic approach to profiling and optimization— you measure first, then optimize Curiosity about diffusion inference, speculative decoding, quantization, or other inference‑time acceleration techniques Bonus Points Internship or research experience with LLM inference, ML systems, or model serving Contributions to open‑source inference frameworks (vLLM, SGLang, TensorRT‑LLM, etc.) CUDA / Triton kernel work, even at a research or hobby scale Publications or research projects in MLSys, model compression, or inference optimization Familiarity with multimodal or streaming inference architectures Experience with hard latency SLAs in any real‑time system Compensation $200,000 – $300,000 base salary, plus meaningful equity. We think long‑term ownership matters and structure equity accordingly. Logistics Location: In‑person in Seattle, five days a week — we believe in the compounding value of working shoulder‑to‑shoulder. Visa sponsorship: We sponsor visas (O‑1, H‑1B, green card) from day one. AI‑native tooling: Do your best work with the best tools, including unlimited tokens. Benefits Health: HSA plan with ~$2,000 in annual company contributions — roughly 2× what most big tech companies put in. Time off: 15 days of PTO plus public holidays, and we close the office for a full week at year‑end. Food: Lunch, drinks, and snacks on us every workday — the small thing that quietly makes the day better. Commuter benefits: We help cover the cost of getting to the office. 401(k): In the works. Nuance Labs is an equal opportunity employer. We believe diverse teams build better AI. #J-18808-Ljbffr Nuance Labs
$250k - $350k
Member of Technical Staff — Model Optimization and Inference Seattle, Washington About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with... ...closely with research and infrastructure to ensure new models ship with optimized serving from day one...Suggested$180k
...Member Of Technical Staff - Imagine Model Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create... ...data curation, modeling, training, inference serving, and product integration,... ...learning systems. Ability to deliver optimal end-to-end user experiences....SuggestedTemporary work$250k - $350k
...developing foundation models designed for it from... ...for a deeply technical Member of Technical Staff to own RL and post-training... ...modeling, policy optimization, evaluation, data feedback... ...adaptability toward new RL algorithms, model... ...infrastructure, inference serving, simulation,...New gradInternshipH1bWork at officeVisa sponsorshipShift work$202.16k - $368.22k
...easier way. The team has research groups dedicated to generative models for content creation, image generation, video synthesis,... ...We are seeking an experienced Multimodal Model Training and Inference Optimization Engineer with expertise in optimizing AI model training and...SuggestedTemporary workLocal area$180k
...Member Of Technical Staff - Model Training Austin, TX; New York, NY; Palo Alto, CA; Seattle, WA About XAI XAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly...SuggestedTemporary work$242k - $290k
...Model Optimization & Deployment Engineer The Perception team is pioneering the development of a multi-modality foundation model to drive... ...models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge...Temporary workRelocation package$180k
...be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: You will work on the most critical modeling challenges at any given time. You will get clarity on your first project before an offer. BASIC QUALIFICATIONS: You...Temporary work$159.75k - $255.6k
Sr. Full Stack Member of Technical Staff Seattle, Washington, United States Join... ...the full stack, from data, models, and infrastructure to... ...constrained devices. Architect and optimize full‑stack AI pipelines.... ...platforms for large‑scale inference and training. Strong...Work at office$300k - $400k
Member of Technical Staff — Pretraining Infra Seattle, Washington About Nuance Labs... ...we're developing foundation models designed for it from the... ...long-running training jobs. Optimize large-scale training performance... ..., and adaptability to new model architectures, training...$120k - $150k
# Member of Technical Staff (AI-Powered EdTech)Colleague AI$120K - $1600KKirkland, WA, USSeniorAI/ML EngineerInterested... ...**. We fully integrate **the best AI models and tools** into our **product design... ...and cloud technologies.* Build and optimize **AI\-driven features** for...Permanent employmentFull timeFlexible hours$200k - $300k
Member of Technical Staff — ML Data Infra Seattle, Washington About Nuance Labs Nuance... ...we're developing foundation models designed for it from the... ...as you are designing a new pipeline architecture from scratch... ...without losing correctness Optimize pipeline throughput and...$96.8k - $223.4k
...can design and build innovative new systems from the ground up. As... ...and excellence. As a valued member of our software engineering division... ...Collaborate and lead technical discussions across multiple teams... ...principles Data management: data modeling, data warehousing, data...Temporary workRemote workFlexible hours$79.2k - $178.1k
...infrastructure responsible for automating the full server lifecycle from new platform shape (AMD/Intel/Arm/Nvidia) creation, hardware... ...cloud in the industry. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major...Temporary workWorldwideFlexible hours- ...systems challenges, and help deliver the foundation for OCI’s most performant compute services. Responsibilities As a Senior Member of Technical Staff, you will own the software design and development for major components of Oracle’s Cloud Infrastructure. You should be a...Temporary workWorldwideFlexible hours
$180k
...accurately share knowledge with their teammates. ABOUT THE ROLE: We're looking for exceptional media engineers who want to join us on a new project to deeply integrate xAI's advanced AI infrastructure into a platform used by around 600 million users every month. We're...Temporary work$124k - $280k
...Strategy Consulting - Business Model Reinvention - Senior Manager... ...opportunities for growth, optimize operations, and enhance overall... ...and reinforce professional and technical standards.... ...collaborating closely with team members. We evaluate these factors thoughtfully...Full timeH1b$99k - $232k
...As a Strategy& - Business Model Reinvention - Manager, you will... ...identify growth opportunities, optimize operations, and enhance overall... ...planning and mentoring junior staff. You are accountable for project... ...coaching and feedback to team members to foster professional growth...Full timeH1b- ...complex, even for advanced developers. This new generation of applications need fast,... ...the status-quo. You challenge the current model of the world and take leaps of faith to build... ....ai OSS project ~30-60 days - take technical and engineering ownership of an entire feature...Work at office
$57 per hour
...reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. Conduct research on infrastructure and... ...related to large‑scale systems, inference optimization, compilers, or performance optimization....Hourly payInternshipLocal area$26 per hour
Aston Carter is seeking a Supply Chain Analyst based in Seattle, WA. This contract position entails driving continuous improvement in processes while problem-solving and designing solutions for the supply chain network. Candidates should have a Bachelor's degree in Engineering...Contract work$112.7k - $169.1k
...predicting user value, optimizing bids, and... ...— large language models, reinforcement learning... ...experiments using causal inference, A/B testing, and... ...clearly to technical and non-technical... ...we want our team members to thrive. We offer... ...days | Support for new parents through leave...New gradInternshipWork at officeWorldwideRelocation packageShift work- ...Seattle is seeking a skilled Support Analyst to provide exceptional support for members using their AI-powered solutions. The ideal candidate will have 3-5 years of experience in a technical role, with expertise in financial services technology. Responsibilities include...
- ...The Compute Bare Metal Provisioning team owns the critical infrastructure responsible for automating the full server lifecycle from new platform shape (AMD/Intel/Arm/Nvidia) creation, hardware bring‑up to customer‑ready instance provisioning and firmware management. The...WorldwideFlexible hours
$2,200 - $3,150 per week
...week travel assignment. As a member of our team, you'll have the... ...Physical Therapy experience, but New Grads are welcome to apply Other... ...ensure coordinated care and optimize treatment outcomes Educate... ...and professionalism Technical/Motor Skills - Must have the...New gradContract workTemporary workWork from homeRelocation packageShift work- ...Engineer to build and optimize data... ...with data-related technical issues and support... ...data science team members that assist them in... ...Passionate about learning new technologies Company... ...Jobright.ai by 2x Inferred from the... ...Software Engineer - New Grad, Distributed Data...New gradFull time
- ...Python Infrastructure Engineer - Model Evaluation (AI Training)... ...ll Do Design, build, and optimize high-performance Python systems... ...for ML models, integrating with inference frameworks Improve reliability... ...ongoing work and contract extension as new projects launch...Hourly payOngoing contractContract workFreelanceRemote workFlexible hours
$45 - $55 per hour
...your therapy career to new heights! Parkshore is partnered... ...and pathways to ensure optimal patient outcomes Be... ...for each team member on their professional journey... ...the community New Grads encouraged to apply –... ...seniors and welcoming staff of all backgrounds, skills...New gradContract workTemporary work$40 - $50 per hour
...your therapy career to new heights! Parkshore is... ...and pathways to ensure optimal patient outcomes Be... ...development for each team member on their professional... ...in the community New Grads encouraged to apply -... ...seniors and welcoming staff of all backgrounds, skills...New gradContract workTemporary work$50 - $56 per hour
...facility has been recently updated with a new therapy gym & equipment. Why Work With... ...Day ~ Holiday Pay for Full-Time team members ~ Flexible schedule options ~ Career development... ...SNF experience is a plus – but new grads are encouraged to apply! Strong...New gradFull timeTemporary workReliefLocal areaRelocation packageFlexible hours- ...in Renton, Washington. As a member of our team, you'll have the... ...Therapy Assistant experience, but New Grads are welcome to apply Other... ...healthcare team to support optimal recovery and overall quality... ...attitude and professionalism Technical/Motor Skills - Must have the...New gradTemporary workWork from homeShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Member of Technical Staff — Model Optimization and Inference (New Grad). Be the first to apply!
- remote support technician Seattle, WA
- personal computer support technician Seattle, WA
- customer support analyst Seattle, WA
- systems support technician Seattle, WA
- help desk administrator Seattle, WA
- decision support analyst Seattle, WA
- technical support assistant Seattle, WA
- technical analyst Seattle, WA
- technical assistant Seattle, WA
- IT support technician Seattle, WA


