Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization
$168.1k - $227.4kAmazon
Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago-even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.
AWS Neuron is the complete software stack for the AWS Trainium and Inferentia cloud-scale machine learning accelerators and the Trn3/Trn2/Trn1 and Inf2/Inf1 servers that use them. This role is for a software engineer in the Distributed Training team for AWS Neuron. This role is responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive scale multi-modal large language models like Llama, Qwen, gpt-oss, DeepSeek and beyond, as well as multi-modal generation models such as Stable Diffusion, Flux, WAN, and many more. The Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with AWS Trainium, maximize training throughput, minimize time-to-convergence, and push the boundaries of training efficiency on Trainium. You will identify and resolve performance bottlenecks across the stack, from collective communications and memory utilization to compiler optimizations and kernel performance. Key job responsibilities This role will lead efforts to optimize distributed training performance on Trainium, with a primary focus on maximizing training throughput, model flops utilization, and efficiency across the Neuron software stack. You will work across PyTorch, JAX, and the Neuron compiler and runtime to enable and tune large-scale training workloads on the latest Trainium instances. About the team Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future. BASIC QUALIFICATIONS - - 5+ years of non-internship professional software development experience - - 5+ years of programming with at least one software programming language experience - - 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience - - 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience - - Experience as a mentor, tech lead or leading an engineering team PREFERRED QUALIFICATIONS - - Bachelor's degree in computer science or equivalent - - Machine Learning knowledge in frameworks and end to end model training. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner. The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at USA, WA, Seattle - 168,100.00 - 227,400.00 USD annually$184.9k - $250.2k
...silicon and software for our... ...engineers. Our team... ...in high-performance machine... ...Amazon Neuron, Inferentia... ...Trainium ML chips, in... ...such as AWS Nitro, Enhanced... ...ML training... ...applying AI agents to... ...Inferentia. As a Sr. SDE you... ..., and distributed systems.... ...and optimizing models for...Amazon Web ServiceSeniorTrainingPerformanceInternshipFlexible hours$193.3k - $261.5k
...Web Services (AWS) builds AWS Neuron, the software development... ...Trainium ML accelerators... ...inference and training performance. The Inference... ..., our engineers build systematic... ...tuned for optimal performance... ...possible in AI acceleration... ...computing, and distributed...Amazon Web ServiceSeniorTrainingPerformanceWork experience placementInternshipLocal areaFlexible hours$127.1k - $185k
...Labs designs silicon and software that accelerates... ...customers change the world. AWS Neuron is the complete... ...seeking a Senior Software Engineer to join our ML Distributed Training team. In this role,... ...development, enablement, and performance optimization of large scale ML...Amazon Web ServiceTrainingPerformanceInternshipLocal areaRemote workFlexible hours$168.1k - $227.4k
...Description AWS Neuron is the complete software stack for the AWS Inferentia... ...a senior software engineer in the Machine... ...development and performance optimization of core building... ...their architecture, training and inference... ...Neuron, TPU or other AI acceleration hardware...Amazon Web ServiceSeniorTrainingPerformanceWork experience placementFlexible hours$242.1k - $327.5k
...applying AI to AI. You... ...adoption of Neuron, the software stack... ...critical to AWS's Generative... ...to AWS's ML silicon.... ...scientists, engineers, product managers... ...port and optimize Machine... ...and price/performance equation... ...architecture, model training, neural... ...- Distributed inference...Amazon Web ServiceSeniorTrainingPerformanceFlexible hours- ...Sr. Lead AI Engineer (Inference Optimization, FM Hosting, AI Platform)... ...applications of AI & ML are bringing... ...scalable, high-performance AI infrastructure... ...and support AI software components including... ...model training, large language... ...technologies such as AWS Ultraclusters,...Amazon Web ServiceSeniorTrainingPerformance
$193.3k - $261.5k
...integral part of AWS and... ...hardware and software components... ...chips that optimize the AWS customer... ...The AWS Neuron Collectives... ...a Software Engineer to optimize... ...frontier AI models being trained today. Collectives... ...maximum performance using C/C++... ..., and distributed systems....Amazon Web ServiceSeniorTrainingPerformanceLocal areaWork from homeFlexible hours$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)... ...applications of AI & ML are bringing... ...scalable, high-performance AI infrastructure... ...and support AI software components including... ...model training, large language... ...technologies such as AWS Ultraclusters,...Amazon Web ServiceSeniorTrainingPerformanceFull timePart timeLocal area- ...Sr. Advanced AI Software Engineer Honeywell is seeking a... ...while driving performance, reliability,... ...advanced AI/ML systems (including... ...Lead model optimization, evaluation,... ...for training, deployment,... ...algorithms, and distributed systems, API... ...platforms: Azure, AWS, Containerization...Amazon Web ServiceSeniorTrainingPerformancePermanent employmentFull timeTemporary workRelocation packageFlexible hours
$148.7k - $199.4k
...Sr Software Engineer Disney Entertainment and... ...advertising, and distribution businesses for... ...intelligent, AI-driven systems... ...and performance of Disney's global... ...Experience/Skills/Training: ~5+ years... ...and monitoring ML systems (... ...experience with AWS ecosystem and...Amazon Web ServiceSeniorTrainingPerformance- ...: FIG’s AI & Analytics... ...modern data engineering, and agent‑driven... ..., and ML readiness.... ...engineering to training, evaluation,... ..., and model optimization techniques.... ...technologies (AWS preferred),... ...microservices, and distributed systems.... ...model performance, identify failure...Amazon Web ServiceSeniorTrainingPerformance
$130k - $150k
...Senior Software Engineer, Full Stack - AI We are seeking... ...compliance Optimize LLM usage for... ...reliability Improve performance, scalability,... ...developing distributed application... ...or applied ML building real... ...Databricks, AWS and/or Azure)... ...education, training, experience,...Amazon Web ServiceSeniorTrainingPerformanceImmediate start$135k - $155k
...Job Title AI Services Software Engineer Job Description... ..., scalable, performant, and highly... ...datasets for training and... ...implement scalable ML/AI systems and... ...pipeline Optimize model performance... ...(e.g. AWS Lambda). ~... ...Experience with distributed computing frameworks...Amazon Web ServiceSeniorTrainingPerformanceContract workRemote workRelocation$320k
...steerable AI systems.... ...researchers, engineers, policy... ...and optimizes Claude to... ...companies across AWS, GCP,... ..., performance improvements... ...inference or ML... ...significant software engineering... ...large-scale distributed systems serving... ..., training, and/or experience... ...Neurons, Scaling...Amazon Web ServiceSeniorTrainingPerformanceWork at officeVisa sponsorshipFlexible hours- ...highly motivated Software Engineer to join our growing AI and Generative... ..., safety, and performance of production AI... ...large-scale ML training, inference, and... ...systems. Build distributed systems and cloud... ...Architect and optimize retrieval-augmented... ...such as AWS, GCP, or Azure....Amazon Web ServiceSeniorTrainingPerformance
$87.52k - $140.77k
...Role: Senior Software Engineer ( Gen AI) Location... ...Enhancing AI and ML architecture... ...by optimizing resources.... ...pipelines for model training and evaluation... ...optimizing high-performance, low-latency... ...cloud platforms (AWS, GCP, or... ...Understanding of distributed systems and...Amazon Web ServiceSeniorTrainingPerformanceLocal areaRemote workRelocationFlexible hours$337.1k - $426.7k
...Director of AI Engineering within the Security... ...and software architects dedicated... ...for AI/ML integration across... ..., and optimize the overall engineering... ...(AWS, Azure, or GCP... ...manage high-performing, geographically distributed teams. Experience... ...certifications, and/or training. The full...Amazon Web ServiceSeniorTrainingPerformanceFull timeTemporary workLocal areaFlexible hours$120k - $150k
...Full Stack Software Engineer JLL empowers... ...next-generation AI assistants and... ...development, performance optimization, and high development... ...integrate AI/ML models and RAG... ..., and distributed systems in Azure... ...cloud platforms (AWS, GCP).... ...provide guidance, training, and technical...Amazon Web ServiceSeniorTrainingPerformanceDaily paidShift work- Sr. Product Manager -... ...Runtime Infra, AI/ML,... ...Services (AWS) job located... ...AWS Neuron is looking... ...acceleration software. AWS... ...class ML performance in the cloud... ...enabling ML training and... ...at scale, optimal orchestration... ...engineering discussions... ...Experience with distributed computing...Amazon Web ServiceSeniorTrainingPerformance
$190k - $230k
...unites agentic AI solutions... .... Senior Software Engineer, AI Platform... ...improve system performance in alignment... ...distributed system features... ...grade AI or ML systems, including... ...architectures, or LLM optimization techniques... ...services (AWS Bedrock, GCP... ..., benefits, training, and...Amazon Web ServiceSeniorTrainingPerformanceApprenticeshipWork at officeLocal areaRemote workFlexible hoursShift work1 day per week$140k - $215k
...Software Development Engineer As a global leader... ...most advanced AI-native... ...large scale distributed systems, processing... ...and performance Conduct... ...environments (AWS/OCI/GCP/... ...profiling and optimization tools... ...understanding of AI/ML security... ...selection, training,...Amazon Web ServiceSeniorTrainingPerformanceWork experience placementWork at officeLocal areaWorldwide2 days per week3 days per week$146.5k
...their best performance, while committing... ...: The ML Data Engineering team powers... ..., and distributed systems, collaborating... ...a Senior Software Engineer... ...design and optimize large-scale... ...running on AWS, supporting... ...generative AI and metadata... ...education or training; and other business...Amazon Web ServiceSeniorTrainingPerformanceLocal areaWorldwideHome officeFlexible hours$108.8k - $191.82k
...seeking a Senior AI Data Engineer to support mission... ...advanced AI/ML capabilities in highly... ...databases • Optimize performance and reliability of... ...Experience with distributed data processing frameworks... ...(e.g., AWS GovCloud, Azure Government... ..., education/ training, key skills as...Amazon Web ServiceSeniorTrainingPerformanceFull timeTemporary workWork experience placementWork at officeRemote workRelocationFlexible hoursShift work3 days per week$165.2k - $223.6k
...Description AWS Neuron is the complete software stack for... ...Development Engineer for the Neuron... ...high-performance monitoring and... ...applications and AI accelerators... ...in optimizing AI workloads... ...performance of ML Kernels and... ...massive-scale distributed training and inference...Amazon Web ServiceTrainingPerformanceInternshipLocal areaWork from homeFlexible hours$193.3k - $261.5k
...AWS Neuron is the software stack powering AWS Inferentia and... ...to deliver high-performance, low-cost inference... ...Software Development Engineer to lead and... ...scale generative AI applications.... ...lead the design of distributed ML serving systems optimized for generative AI...Amazon Web ServicePerformanceInternshipLocal areaFlexible hours$165k - $241.4k
...Senior AI/ML Engineer This is a hybrid position... ...across the software stack, specializing... ...kernel, integrate and optimize Yocto layers, and... ...cloud (Azure/AWS/GCP), Docker/Kubernetes... ..., and/or training. The full salary... ...sales plans earn performance-based incentive pay...Amazon Web ServiceSeniorTrainingPerformanceFull timeTemporary workWork at officeLocal areaFlexible hours3 days per week$132k - $150k
...time economic optimization and AI prediction to... ...As a Senior Software Engineer - Backend &... ...databases, and distributed backend... ...while preserving performance Evaluate and... ...Support training and deployment... ...TigerData, InfluxDB, AWS, and GitHub-... ...AI/ML systems in production...Amazon Web ServiceSeniorTrainingPerformanceWork experience placementLive inWork at officeVisa sponsorshipFlexible hours- ...national distribution network.... ...We are a performance-driven, data... ...This Role Is AI-Forward... ...Every engineer uses AI coding... ...architecture — optimizing context... ...current with AI/ML tooling,... ...in software architecture... ...services (AWS, Azure and... ...compensation, training, and...Amazon Web ServiceSeniorTrainingPerformanceTemporary workLocal areaWorldwide
$146.5k - $228k
...the team: The ML Data Engineering team powers... ..., and distributed systems, collaborating... ...a Senior Software Engineer... ...design and optimize large-scale... ...running on AWS, supporting... ...generative AI and metadata... ...scalability, high performance, and rapid iteration... ...or training; and other business...Amazon Web ServiceSeniorTrainingPerformanceTemporary workLocal areaWorldwideHome officeFlexible hours- ...scientists and ML engineers • Develop... ...model training, experimentation... ...across the AI/ML lifecycle... ...and strong software engineering... ...Build SDKs optimized for notebook... ...linting, and performance optimization... ...maintaining, and distributing Python... ...deploying models to AWS, GCP, or...Amazon Web ServiceSeniorTrainingPerformanceContract work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization. Be the first to apply!
- software sales engineer United States
- software engineer full time United States
- facebook software engineer United States
- startup software engineer United States
- intermediate software engineer United States
- research software engineer United States
- software developer no experience United States
- labview software developer United States
- rust software engineer United States
- freelance software developer United States


