AI Inference Tech SDM - Lead High-Perf LLM Inference

Payfuture Technologies

Software Development Manager, AI Inference Technology, Neuron SDK job at Annapurna Labs (U.S.) Inc.. Seattle, WA. DESCRIPTION DESCRIPTION AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon Elastic Compute Cloud (EC2), to new product innovations that continue to set AWS’s services and features apart in the industry. We develop AWS Neuron, the complete software stack for Trainium, Amazon's custom cloudscale machine learning accelerators. Come optimize LLMs such as Llama and GPT OSS to run really fast on Trainium. As the SDM for the Neuron Inference Technology building blocks team, you will guide your expert AI engineers to build fundamental inference technology building blocks and libraries to enable AI developers to optimize model for inference on Trainium and Inferentia devices. We’re currently focusing on MoE models such as GPT OSS for Trainium 2 and the upcoming Trainium 3. You will develop and optimize blocks such as attention kernels and deliver them in the Neuronx_Distributed Inference Libraries, enabling customers to optimize LLMs, multimodal, and generative models. The ideal candidate will have an established background in optimizing LLMs, such as delivering high-performance models using distributed inference libraries. You should be capable of managing demanding, fast-changing priorities. You should have a strong technical ability to understand and deliver as part of a vertically integrated system stack consisting of the PyTorch inference library, Neuron compiler, runtime and collectives. A day in the life You will work with your senior management and technical leaders to define the building blocks for the latest LLMs, build and deliver them to customers. You will manage changing priorities as new models and new technologies emerge, and you adapt your team’s work to manage them. You will dive deep to help your team solve technical challenges. About the team About AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS - 3+ years of engineering team management experience

7+ years of working directly within engineering teams experience
3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
Experience partnering with product or program management teams

#J-18808-Ljbffr Payfuture Technologies

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the AI Inference Tech SDM - Lead High-Perf LLM Inference in Seattle, WA vacancy

Tech Lead, Data & Inference Engineer
...Tech Lead, Data & Inference Engineer Seattle, Washington, United States About the Job Tech Lead, Data... ...to convert static audience lists into high match and cross channel segments without... ...with a specialized vertical in Applied AI, Machine Learning, and Data Science. We...
Suggested
Full time
Catalyst Labs, LLC
Seattle, WA
1 day ago
Software Engineer Graduate (Inference Infrastructure) - 2026 Start (PhD)
$148.2k - $300.96k
...About the Team The Inference Infrastructure team... ...plane for large-scale LLM inference. We are... ...that is highly performant, massively... ...developers to bring AI workloads from research... ...with great people. We lead with curiosity, humility... ...a rapidly growing tech company. By...
Suggested
Temporary work
Local area
ByteDance
Seattle, WA
1 day ago
Senior AI Inference Engineer - Model Optimization & Deployment
$242k - $290k
...Engineer, you will focus on bringing highly efficient, production-ready... ..., and build highly concurrent inference code to ensure real-time,... ...maximize memory bandwidth on AI accelerators. Write production... ...technologies (e.g., TensorRT-LLM). $242,000 - $290,000 a year...
Suggested
Temporary work
Relocation package
Zoox
Seattle, WA
2 days ago
Senior/Staff Software Engineer - Machine Learning Platform (Inference)
$236k - $339.25k
...usher in this new era, we seek AI-native thinkers across every function... ...curiosity, treating AI as a high-trust collaborator that is core... ...-the-art machine learning and LLM workloads. Join us to define... ...Experience in serving LLMs using inference engines like vLLM, TensorRT-LLM...
Suggested
Flexible hours
Streamlit
Bellevue, WA
2 days ago
Senior Software Engineer - Perf and Benchmarking
$182k - $242k
...Software Engineer - Perf and Benchmarking... ...Essential Cloud for AI™. Built for pioneers... ...confidence. Trusted by leading AI labs, startups,... ...Training and Inference runs, including workload... ...deliver reliable, high-quality code.... ...model-serving stacks (llm-d, vLLM, TensorRT-LLM...
Suggested
Permanent employment
Temporary work
Casual work
Work at office
Remote work
Flexible hours
CoreWeave
Bellevue, WA
3 days ago
AI Inference Engineer: Kubernetes, vLLM, Customer Delivery
Red Hat, LLC is seeking a Forward Deployed Engineer to enhance their LLM-D and vLLM platforms. You will be responsible for deploying and optimizing distributed inference systems on Kubernetes, working closely with customer teams. The ideal candidate has extensive experience...
Red Hat, LLC
Seattle, WA
17 hours ago
Senior AI Inference Data Plane Engineer (Remote)
$167.2k - $209k
A pioneering cloud service provider in Seattle seeks a Senior Engineer 2 for its AI Inference Data Plane team. This role requires designing and delivering high-scale, resilient data services. Responsibilities include technical leadership, system design, performance optimization...
Remote work
DigitalOcean
Seattle, WA
10 days ago
Staff Software Engineer, Cloud Inference Safeguards
$405k
...create reliable, interpretable, and steerable AI systems. We want AI to be safe and... ...the Safeguards organization and the Cloud Inference team: taking classifiers, detection signals... ...stability, or overall architecture Hold a high operational bar: own on‑call, drive root‑cause...
Work at office
Visa sponsorship
Flexible hours
Menlo Ventures
Seattle, WA
4 days ago
SR Principal Software Engineer - LLM Engineering
...We're looking for a tech leader ready to... ...deliver trusted market-leading technology products... ...inferencing for high throughput and low... ...optimization using Model Inference servers such as... ...production operations for AI workloads,... ...architecting and deploying LLM & GNN solutions on...
Chase
Seattle, WA
2 days ago
Senior Software Engineer II, Inference
$165k - $242k
CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...AI with confidence. Trusted by leading AI labs, startups, and global... ...evolve our Kubernetes-native inference platform and meet strict P99 SLAs... ...(vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe)....
Permanent employment
Full time
Temporary work
Casual work
Work at office
Remote work
Flexible hours
Shift work
Coreweave
Bellevue, WA
17 hours ago
Software Engineer, Inference AI/ML
$92k - $135k
...CoreWeave is the AI Hyperscaler™, delivering a cloud platform... ...technology provides enterprises and leading AI labs with the most... ...What You’ll Do: Join the Inference team to ship production features... ...(e.g., Triton, vLLM, TensorRT-LLM, Ray Serve). Write tests, code...
Permanent employment
Full time
Temporary work
Casual work
Internship
Work at office
Remote work
Flexible hours
Coreweave
Bellevue, WA
17 hours ago
AI Inference Infrastructure Software Engineer (Kubernetes / Cloud)
...Seattle, WA (Hybrid - 3 days/week in office) About ElastixAI ElastixAI is an early-stage Software startup on a mission to reinvent AI inference infrastructure from the ground up. We're building a next-generation inference platform that delivers unprecedented efficiency by...
Work at office
Flexible hours
3 days per week
ElastixAI Inc.
Seattle, WA
20 days ago
AI Inference Infra Engineer - Kubernetes & Cloud
ElastixAI INC. in Seattle seeks an Inference Infrastructure Software Engineer to manage the cloud and Kubernetes backbone behind their Token... ...benefits, and the opportunity to work at the forefront of AI technology in a collaborative environment. #J-18808-Ljbffr ElastixAI...
ElastixAI INC.
Seattle, WA
3 days ago
Senior Backend Engineer, AI Inference Platform
A leading database platform provider is seeking a Software Engineer 3 to design and develop core systems for a multi-tenant inference platform integrated with their database service. The role emphasizes collaboration with AI engineers, optimizing performance in a cloud...
MongoDB
Seattle, WA
4 days ago
ML Platform Engineer — Inference & Optimization (Hybrid)
An innovative AI startup is seeking a talented Machine Learning Engineer to play a key role in building their core AI inference platform in Seattle. Responsibilities include designing and developing components, researching and implementing advanced ML techniques, and collaborating...
ElastixAI INC.
Seattle, WA
17 hours ago
Principal Software Engineer - High Performance Computing
...working for one of the world's leading financial institutions, you've... ...teaching them best practices in high-performance computing (HPC) practices that intersect with AI/ML. Thus, you are collaborative... ...patterns to optimize training and inference of ML models on various...
Chase
Seattle, WA
2 days ago
CPU Storage Tech Lead
$342k
...infrastructure that powers large-scale AI systems. We design and deliver... ...a CPU & Storage Technical Lead to define and drive the server... ...are optimized for training, inference, and supporting services. You... ...storage vendors. This is a highly strategic role for someone who...
Local area
OpenAI
Seattle, WA
4 days ago
Lead, Engineering Excellence
$179.88k
...WITH Bain’s Vector leads the firm’s software... ...clients improve AI-assisted or AI-led... ...and optimize model inference latency and cost Develop... ...and optimize LLM‑powered applications... ...up or fast‑growing tech company, with a strong... ...of failover, high‑availability, and high...
Full time
Work experience placement
Work at office
Local area
Home office
3 days per week
Bain & Company
Seattle, WA
2 days ago
Lead AI Engineer
...Lead AI Engineer in the Platforms and Products ZS is a place where... ...will… We are seeking a highly motivated Applied AI Engineer... ...and evaluating production-grade LLM systems, including Retrieval-Augmented... ...workflows, and scalable inference pipelines. Design and implement...
Work at office
Worldwide
ZS Associates
Bellevue, WA
2 days ago
Software Engineer (SWE/SWE II), AI Platform- Slack
$117.2k - $223.9k
...Salesforce is the #1 AI CRM, where humans... ...ambition meets action. Tech meets trust. And... ...at the company leading workforce transformation... ...release them with high quality. Equally... ...training, deployment, inference, and monitoring. As... ...platform supports LLM efficiency and model...
Salesforce.Com Inc
Seattle, WA
17 hours ago
Senior Software Development Engineer (GenAI, Agentic AI)
$184.5k
...Software Development - AI Engineer Our Technology... ..., and tools to deliver high-quality experiences for... ...speed. Role Summary Lead the architecture and... ...monitoring and debugging LLM and multi-agent applications... ...pipelines, online inference, monitoring/retraining);...
Local area
Expedia Group
Seattle, WA
2 days ago
Software Engineer (Multiple Levels) - Machine Learning Infrastructure, Slack
$148.5k - $313.7k
...Salesforce is the #1 AI CRM, where humans... ...ambition meets action. Tech meets trust. And... ...at the company leading workforce transformation... ...release them with high quality. Equally... ...training, deployment, inference, and monitoring. As... ...platform supports LLM efficiency and model...
Temporary work
Salesforce
Seattle, WA
5 days ago
Software Engineer, Workload Enablement
$293k
...and operation of cutting-edge AI models. Our work spans system software... ...benchmarks, porting existing inference and training workloads to new,... ...with: ~ PyTorch and modern LLM training/inference stacks ~... ...skills (e.g., Nsight, rocprof, perf, flamegraphs; ability to reason...
OpenAI
Seattle, WA
17 hours ago
Principal Software Engineer
$160k - $250k
...every touchpoint. Backed by leading investors, we're building... ...help define the future of AI-native content operations,... ...fast-evolving product in a high-agency, low-ego environment... ...implementation, and scaling of LLM agents for real-time inference, dynamic prompting, memory...
Gradial
Seattle, WA
4 days ago
Principal Software Engineer
...: DataRobot delivers AI that maximizes impact and... ...and vision. You'll lead by example-rolling up your... ...complexity, and help drive a high-performance culture. You... ..., and optimize the inference engine that powers DataRobot... ...large language model (LLM) serving systems are fast...
Local area
Worldwide
Flexible hours
DataRobot
Seattle, WA
4 days ago
AI/LLM Sr Manager of Software Engineering - Java and Python
...software engineers delivering AI-enabled capabilities across the... ...and workflow-based solutions Lead with a hands-on mindset: stay close... ...use-case delivery, including LLM integration approaches,... ...equivalents) to accelerate secure, high-quality development, test automation...
Flexible hours
Shift work
Chase
Seattle, WA
2 days ago
Software Engineer (Multiple Levels) - Machine Learning Infrastructure, Slack
Description About Slack AI Slack AI's mission is to transform how... ...operates reliable, scalable, and high performance platforms that... ...including model training, deployment, inference, and monitoring. As Slack AI... ...safely. The platform supports LLM efficiency and model transition...
Temporary work
B Capital
Seattle, WA
4 days ago
Software Engineering A
$160k - $215k
Job Summary NetApp’s Cloud AI Team is building a new AI agent product... ...works. You will be part of a high‑performing team and collaborate... ...Experience Experience with LLM integration, AI agent frameworks... ...systems that incorporate model inference into production workflows (tool...
Local area
NetApp, Inc.
Bellevue, WA
3 days ago
Tech Lead Manager, Foundation Models
$298k - $368k
...Tech Lead Manager, Foundation Models Waymo is an autonomous driving technology company with... ...U.S. states. The mission of the Waymo AI Foundations team is to develop machine... ...demonstration, generative modeling, Bayesian inference, hierarchical learning, and robust...
Full time
Temporary work
Remote work
Waymo
Kirkland, WA
17 hours ago
AI/LLM Network Software Development Engineer
$202.16k - $368.22k
...software and hardware co-design, and high-speed networking, to create... ...technologies to support AI/LLM applications. - Design and development... ...things with great people. We lead with curiosity, humility, and a... ...make impact in a rapidly growing tech company. By constantly...
Temporary work
Local area
ByteDance
Seattle, WA
12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Tech SDM - Lead High-Perf LLM Inference. Be the first to apply!