AI Inference Tech SDM - Lead High-Perf LLM Inference
Payfuture Technologies
Software Development Manager, AI Inference Technology, Neuron SDK job at Annapurna Labs (U.S.) Inc.. Seattle, WA. DESCRIPTION DESCRIPTION AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon Elastic Compute Cloud (EC2), to new product innovations that continue to set AWS’s services and features apart in the industry. We develop AWS Neuron, the complete software stack for Trainium, Amazon's custom cloudscale machine learning accelerators. Come optimize LLMs such as Llama and GPT OSS to run really fast on Trainium. As the SDM for the Neuron Inference Technology building blocks team, you will guide your expert AI engineers to build fundamental inference technology building blocks and libraries to enable AI developers to optimize model for inference on Trainium and Inferentia devices. We’re currently focusing on MoE models such as GPT OSS for Trainium 2 and the upcoming Trainium 3. You will develop and optimize blocks such as attention kernels and deliver them in the Neuronx_Distributed Inference Libraries, enabling customers to optimize LLMs, multimodal, and generative models. The ideal candidate will have an established background in optimizing LLMs, such as delivering high-performance models using distributed inference libraries. You should be capable of managing demanding, fast-changing priorities. You should have a strong technical ability to understand and deliver as part of a vertically integrated system stack consisting of the PyTorch inference library, Neuron compiler, runtime and collectives. A day in the life You will work with your senior management and technical leaders to define the building blocks for the latest LLMs, build and deliver them to customers. You will manage changing priorities as new models and new technologies emerge, and you adapt your team’s work to manage them. You will dive deep to help your team solve technical challenges. About the team About AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS - 3+ years of engineering team management experience
- 7+ years of working directly within engineering teams experience
- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
- Experience partnering with product or program management teams
- ...Tech Lead, Data & Inference Engineer Seattle, Washington, United States About the Job Tech Lead, Data... ...to convert static audience lists into high match and cross channel segments without... ...with a specialized vertical in Applied AI, Machine Learning, and Data Science. We...SuggestedFull time
$148.2k - $300.96k
...About the Team The Inference Infrastructure team... ...plane for large-scale LLM inference. We are... ...that is highly performant, massively... ...developers to bring AI workloads from research... ...with great people. We lead with curiosity, humility... ...a rapidly growing tech company. By...SuggestedTemporary workLocal area$242k - $290k
...Engineer, you will focus on bringing highly efficient, production-ready... ..., and build highly concurrent inference code to ensure real-time,... ...maximize memory bandwidth on AI accelerators. Write production... ...technologies (e.g., TensorRT-LLM). $242,000 - $290,000 a year...SuggestedTemporary workRelocation package$236k - $339.25k
...usher in this new era, we seek AI-native thinkers across every function... ...curiosity, treating AI as a high-trust collaborator that is core... ...-the-art machine learning and LLM workloads. Join us to define... ...Experience in serving LLMs using inference engines like vLLM, TensorRT-LLM...SuggestedFlexible hours$182k - $242k
...Software Engineer - Perf and Benchmarking... ...Essential Cloud for AI™. Built for pioneers... ...confidence. Trusted by leading AI labs, startups,... ...Training and Inference runs, including workload... ...deliver reliable, high-quality code.... ...model-serving stacks (llm-d, vLLM, TensorRT-LLM...SuggestedPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours- Red Hat, LLC is seeking a Forward Deployed Engineer to enhance their LLM-D and vLLM platforms. You will be responsible for deploying and optimizing distributed inference systems on Kubernetes, working closely with customer teams. The ideal candidate has extensive experience...
$167.2k - $209k
A pioneering cloud service provider in Seattle seeks a Senior Engineer 2 for its AI Inference Data Plane team. This role requires designing and delivering high-scale, resilient data services. Responsibilities include technical leadership, system design, performance optimization...Remote work$405k
...create reliable, interpretable, and steerable AI systems. We want AI to be safe and... ...the Safeguards organization and the Cloud Inference team: taking classifiers, detection signals... ...stability, or overall architecture Hold a high operational bar: own on‑call, drive root‑cause...Work at officeVisa sponsorshipFlexible hours- ...We're looking for a tech leader ready to... ...deliver trusted market-leading technology products... ...inferencing for high throughput and low... ...optimization using Model Inference servers such as... ...production operations for AI workloads,... ...architecting and deploying LLM & GNN solutions on...
$165k - $242k
CoreWeave is The Essential Cloud for AI™. Built for pioneers by... ...AI with confidence. Trusted by leading AI labs, startups, and global... ...evolve our Kubernetes-native inference platform and meet strict P99 SLAs... ...(vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe)....Permanent employmentFull timeTemporary workCasual workWork at officeRemote workFlexible hoursShift work$92k - $135k
...CoreWeave is the AI Hyperscaler™, delivering a cloud platform... ...technology provides enterprises and leading AI labs with the most... ...What You’ll Do: Join the Inference team to ship production features... ...(e.g., Triton, vLLM, TensorRT-LLM, Ray Serve). Write tests, code...Permanent employmentFull timeTemporary workCasual workInternshipWork at officeRemote workFlexible hours- ...Seattle, WA (Hybrid - 3 days/week in office) About ElastixAI ElastixAI is an early-stage Software startup on a mission to reinvent AI inference infrastructure from the ground up. We're building a next-generation inference platform that delivers unprecedented efficiency by...Work at officeFlexible hours3 days per week
- ElastixAI INC. in Seattle seeks an Inference Infrastructure Software Engineer to manage the cloud and Kubernetes backbone behind their Token... ...benefits, and the opportunity to work at the forefront of AI technology in a collaborative environment. #J-18808-Ljbffr ElastixAI...
- A leading database platform provider is seeking a Software Engineer 3 to design and develop core systems for a multi-tenant inference platform integrated with their database service. The role emphasizes collaboration with AI engineers, optimizing performance in a cloud...
- An innovative AI startup is seeking a talented Machine Learning Engineer to play a key role in building their core AI inference platform in Seattle. Responsibilities include designing and developing components, researching and implementing advanced ML techniques, and collaborating...
- ...working for one of the world's leading financial institutions, you've... ...teaching them best practices in high-performance computing (HPC) practices that intersect with AI/ML. Thus, you are collaborative... ...patterns to optimize training and inference of ML models on various...
$342k
...infrastructure that powers large-scale AI systems. We design and deliver... ...a CPU & Storage Technical Lead to define and drive the server... ...are optimized for training, inference, and supporting services. You... ...storage vendors. This is a highly strategic role for someone who...Local area$179.88k
...WITH Bain’s Vector leads the firm’s software... ...clients improve AI-assisted or AI-led... ...and optimize model inference latency and cost Develop... ...and optimize LLM‑powered applications... ...up or fast‑growing tech company, with a strong... ...of failover, high‑availability, and high...Full timeWork experience placementWork at officeLocal areaHome office3 days per week- ...Lead AI Engineer in the Platforms and Products ZS is a place where... ...will… We are seeking a highly motivated Applied AI Engineer... ...and evaluating production-grade LLM systems, including Retrieval-Augmented... ...workflows, and scalable inference pipelines. Design and implement...Work at officeWorldwide
$117.2k - $223.9k
...Salesforce is the #1 AI CRM, where humans... ...ambition meets action. Tech meets trust. And... ...at the company leading workforce transformation... ...release them with high quality. Equally... ...training, deployment, inference, and monitoring. As... ...platform supports LLM efficiency and model...$184.5k
...Software Development - AI Engineer Our Technology... ..., and tools to deliver high-quality experiences for... ...speed. Role Summary Lead the architecture and... ...monitoring and debugging LLM and multi-agent applications... ...pipelines, online inference, monitoring/retraining);...Local area$148.5k - $313.7k
...Salesforce is the #1 AI CRM, where humans... ...ambition meets action. Tech meets trust. And... ...at the company leading workforce transformation... ...release them with high quality. Equally... ...training, deployment, inference, and monitoring. As... ...platform supports LLM efficiency and model...Temporary work$293k
...and operation of cutting-edge AI models. Our work spans system software... ...benchmarks, porting existing inference and training workloads to new,... ...with: ~ PyTorch and modern LLM training/inference stacks ~... ...skills (e.g., Nsight, rocprof, perf, flamegraphs; ability to reason...$160k - $250k
...every touchpoint. Backed by leading investors, we're building... ...help define the future of AI-native content operations,... ...fast-evolving product in a high-agency, low-ego environment... ...implementation, and scaling of LLM agents for real-time inference, dynamic prompting, memory...- ...: DataRobot delivers AI that maximizes impact and... ...and vision. You'll lead by example-rolling up your... ...complexity, and help drive a high-performance culture. You... ..., and optimize the inference engine that powers DataRobot... ...large language model (LLM) serving systems are fast...Local areaWorldwideFlexible hours
- ...software engineers delivering AI-enabled capabilities across the... ...and workflow-based solutions Lead with a hands-on mindset: stay close... ...use-case delivery, including LLM integration approaches,... ...equivalents) to accelerate secure, high-quality development, test automation...Flexible hoursShift work
- Description About Slack AI Slack AI's mission is to transform how... ...operates reliable, scalable, and high performance platforms that... ...including model training, deployment, inference, and monitoring. As Slack AI... ...safely. The platform supports LLM efficiency and model transition...Temporary work
$160k - $215k
Job Summary NetApp’s Cloud AI Team is building a new AI agent product... ...works. You will be part of a high‑performing team and collaborate... ...Experience Experience with LLM integration, AI agent frameworks... ...systems that incorporate model inference into production workflows (tool...Local area$298k - $368k
...Tech Lead Manager, Foundation Models Waymo is an autonomous driving technology company with... ...U.S. states. The mission of the Waymo AI Foundations team is to develop machine... ...demonstration, generative modeling, Bayesian inference, hierarchical learning, and robust...Full timeTemporary workRemote work$202.16k - $368.22k
...software and hardware co-design, and high-speed networking, to create... ...technologies to support AI/LLM applications. - Design and development... ...things with great people. We lead with curiosity, humility, and a... ...make impact in a rapidly growing tech company. By constantly...Temporary workLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Inference Tech SDM - Lead High-Perf LLM Inference. Be the first to apply!

