Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization

$168.1k - $227.4k

Amazon

Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago-even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.

AWS Neuron is the complete software stack for the AWS Trainium and Inferentia cloud-scale machine

learning accelerators and the Trn3/Trn2/Trn1 and Inf2/Inf1 servers that use them. This role is for a software engineer in the Distributed Training team for AWS Neuron. This role is responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive scale multi-modal large language models like Llama, Qwen, gpt-oss, DeepSeek and beyond, as well as multi-modal generation models such as Stable Diffusion, Flux, WAN, and many more.

The Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with AWS Trainium, maximize training throughput, minimize time-to-convergence, and push the boundaries of training efficiency on Trainium. You will identify and resolve performance bottlenecks across the stack, from collective communications and memory utilization to compiler optimizations and kernel performance.

Key job responsibilities

This role will lead efforts to optimize distributed training performance on Trainium, with a primary focus on maximizing training throughput, model flops utilization, and efficiency across the Neuron software stack. You will work across PyTorch, JAX, and the Neuron compiler and runtime to enable and tune large-scale training workloads on the latest Trainium instances.

About the team

Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.

BASIC QUALIFICATIONS

- - 5+ years of non-internship professional software development experience

- - 5+ years of programming with at least one software programming language experience

- - 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience

- - 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

- - Experience as a mentor, tech lead or leading an engineering team

PREFERRED QUALIFICATIONS

- - Bachelor's degree in computer science or equivalent

- - Machine Learning knowledge in frameworks and end to end model training.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at

USA, WA, Seattle - 168,100.00 - 227,400.00 USD annually
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization in United States vacancy
  • $184.9k - $250.2k

     ...silicon and software for our...  ...engineers. Our team...  ...in high-performance machine...  ...Amazon Neuron, Inferentia...  ...Trainium ML chips, in...  ...such as AWS Nitro, Enhanced...  ...ML training...  ...applying AI agents to...  ...Inferentia. As a Sr. SDE you...  ..., and distributed systems....  ...and optimizing models for... 
    Amazon Web Service
    Senior
    Training
    Performance
    Internship
    Flexible hours

    Amazon

    New York, NY
    10 hours ago
  • $193.3k - $261.5k

     ...Web Services (AWS) builds AWS Neuron, the software development...  ...Trainium ML accelerators...  ...inference and training performance. The Inference...  ..., our engineers build systematic...  ...tuned for optimal performance...  ...possible in AI acceleration...  ...computing, and distributed... 
    Amazon Web Service
    Senior
    Training
    Performance
    Work experience placement
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    1 day ago
  • $127.1k - $185k

     ...Labs designs silicon and software that accelerates...  ...customers change the world. AWS Neuron is the complete...  ...seeking a Senior Software Engineer to join our ML Distributed Training team. In this role,...  ...development, enablement, and performance optimization of large scale ML... 
    Amazon Web Service
    Training
    Performance
    Internship
    Local area
    Remote work
    Flexible hours

    Amazon

    United States
    11 hours ago
  • $168.1k - $227.4k

     ...Description AWS Neuron is the complete software stack for the AWS Inferentia...  ...a senior software engineer in the Machine...  ...development and performance optimization of core building...  ...their architecture, training and inference...  ...Neuron, TPU or other AI acceleration hardware... 
    Amazon Web Service
    Senior
    Training
    Performance
    Work experience placement
    Flexible hours

    Amazon

    Seattle, WA
    1 day ago
  • $242.1k - $327.5k

     ...applying AI to AI. You...  ...adoption of Neuron, the software stack...  ...critical to AWS's Generative...  ...to AWS's ML silicon....  ...scientists, engineers, product managers...  ...port and optimize Machine...  ...and price/performance equation...  ...architecture, model training, neural...  ...- Distributed inference... 
    Amazon Web Service
    Senior
    Training
    Performance
    Flexible hours

    Amazon

    New York, NY
    3 days ago
  •  ...Sr. Lead AI Engineer (Inference Optimization, FM Hosting, AI Platform)...  ...applications of AI & ML are bringing...  ...scalable, high-performance AI infrastructure...  ...and support AI software components including...  ...model training, large language...  ...technologies such as AWS Ultraclusters,... 
    Amazon Web Service
    Senior
    Training
    Performance

    Capital One

    San Jose, CA
    1 day ago
  • $193.3k - $261.5k

     ...integral part of AWS and...  ...hardware and software components...  ...chips that optimize the AWS customer...  ...The AWS Neuron Collectives...  ...a Software Engineer to optimize...  ...frontier AI models being trained today. Collectives...  ...maximum performance using C/C++...  ..., and distributed systems.... 
    Amazon Web Service
    Senior
    Training
    Performance
    Local area
    Work from home
    Flexible hours

    Amazon

    Cupertino, CA
    2 days ago
  • $229.9k - $262.4k

     ...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)...  ...applications of AI & ML are bringing...  ...scalable, high-performance AI infrastructure...  ...and support AI software components including...  ...model training, large language...  ...technologies such as AWS Ultraclusters,... 
    Amazon Web Service
    Senior
    Training
    Performance
    Full time
    Part time
    Local area

    Capital One Financial Corp

    San Jose, CA
    11 hours ago
  •  ...Sr. Advanced AI Software Engineer Honeywell is seeking a...  ...while driving performance, reliability,...  ...advanced AI/ML systems (including...  ...Lead model optimization, evaluation,...  ...for training, deployment,...  ...algorithms, and distributed systems, API...  ...platforms: Azure, AWS, Containerization... 
    Amazon Web Service
    Senior
    Training
    Performance
    Permanent employment
    Full time
    Temporary work
    Relocation package
    Flexible hours

    Honeywell

    Charlotte, NC
    2 days ago
  • $148.7k - $199.4k

     ...Sr Software Engineer Disney Entertainment and...  ...advertising, and distribution businesses for...  ...intelligent, AI-driven systems...  ...and performance of Disney's global...  ...Experience/Skills/Training: ~5+ years...  ...and monitoring ML systems (...  ...experience with AWS ecosystem and... 
    Amazon Web Service
    Senior
    Training
    Performance

    Walt Disney Company

    New York, NY
    11 hours ago
  •  ...: FIG’s AI & Analytics...  ...modern data engineering, and agent‑driven...  ..., and ML readiness....  ...engineering to training, evaluation,...  ..., and model optimization techniques....  ...technologies (AWS preferred),...  ...microservices, and distributed systems....  ...model performance, identify failure... 
    Amazon Web Service
    Senior
    Training
    Performance

    Financial Independence Group

    Cornelius, OR
    4 days ago
  • $130k - $150k

     ...Senior Software Engineer, Full Stack - AI We are seeking...  ...compliance Optimize LLM usage for...  ...reliability Improve performance, scalability,...  ...developing distributed application...  ...or applied ML building real...  ...Databricks, AWS and/or Azure)...  ...education, training, experience,... 
    Amazon Web Service
    Senior
    Training
    Performance
    Immediate start

    Fitch Group

    New York, NY
    2 days ago
  • $135k - $155k

     ...Job Title AI Services Software Engineer Job Description...  ..., scalable, performant, and highly...  ...datasets for training and...  ...implement scalable ML/AI systems and...  ...pipeline Optimize model performance...  ...(e.g. AWS Lambda). ~...  ...Experience with distributed computing frameworks... 
    Amazon Web Service
    Senior
    Training
    Performance
    Contract work
    Remote work
    Relocation

    Motorola Solutions

    United States
    2 days ago
  • $320k

     ...steerable AI systems....  ...researchers, engineers, policy...  ...and optimizes Claude to...  ...companies across AWS, GCP,...  ..., performance improvements...  ...inference or ML...  ...significant software engineering...  ...large-scale distributed systems serving...  ..., training, and/or experience...  ...Neurons, Scaling... 
    Amazon Web Service
    Senior
    Training
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...highly motivated Software Engineer to join our growing AI and Generative...  ..., safety, and performance of production AI...  ...large-scale ML training, inference, and...  ...systems. Build distributed systems and cloud...  ...Architect and optimize retrieval-augmented...  ...such as AWS, GCP, or Azure.... 
    Amazon Web Service
    Senior
    Training
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    17 hours ago
  • $87.52k - $140.77k

     ...Role: Senior Software Engineer ( Gen AI) Location...  ...Enhancing AI and ML architecture...  ...by optimizing resources....  ...pipelines for model training and evaluation...  ...optimizing high-performance, low-latency...  ...cloud platforms (AWS, GCP, or...  ...Understanding of distributed systems and... 
    Amazon Web Service
    Senior
    Training
    Performance
    Local area
    Remote work
    Relocation
    Flexible hours

    Blue Yonder

    Rogers, AR
    1 day ago
  • $337.1k - $426.7k

     ...Director of AI Engineering within the Security...  ...and software architects dedicated...  ...for AI/ML integration across...  ..., and optimize the overall engineering...  ...(AWS, Azure, or GCP...  ...manage high-performing, geographically distributed teams. Experience...  ...certifications, and/or training. The full... 
    Amazon Web Service
    Senior
    Training
    Performance
    Full time
    Temporary work
    Local area
    Flexible hours

    Cisco

    San Jose, CA
    3 days ago
  • $120k - $150k

     ...Full Stack Software Engineer JLL empowers...  ...next-generation AI assistants and...  ...development, performance optimization, and high development...  ...integrate AI/ML models and RAG...  ..., and distributed systems in Azure...  ...cloud platforms (AWS, GCP)....  ...provide guidance, training, and technical... 
    Amazon Web Service
    Senior
    Training
    Performance
    Daily paid
    Shift work

    JLL

    Boston, MA
    2 days ago
  • Sr. Product Manager -...  ...Runtime Infra, AI/ML,...  ...Services (AWS) job located...  ...AWS Neuron is looking...  ...acceleration software. AWS...  ...class ML performance in the cloud...  ...enabling ML training and...  ...at scale, optimal orchestration...  ...engineering discussions...  ...Experience with distributed computing... 
    Amazon Web Service
    Senior
    Training
    Performance

    Downtown Boulder Partnership

    Cupertino, CA
    17 hours ago
  • $190k - $230k

     ...unites agentic AI solutions...  .... Senior Software Engineer, AI Platform...  ...improve system performance in alignment...  ...distributed system features...  ...grade AI or ML systems, including...  ...architectures, or LLM optimization techniques...  ...services (AWS Bedrock, GCP...  ..., benefits, training, and... 
    Amazon Web Service
    Senior
    Training
    Performance
    Apprenticeship
    Work at office
    Local area
    Remote work
    Flexible hours
    Shift work
    1 day per week

    HackerOne

    Washington DC
    4 days ago
  • $140k - $215k

     ...Software Development Engineer As a global leader...  ...most advanced AI-native...  ...large scale distributed systems, processing...  ...and performance Conduct...  ...environments (AWS/OCI/GCP/...  ...profiling and optimization tools...  ...understanding of AI/ML security...  ...selection, training,... 
    Amazon Web Service
    Senior
    Training
    Performance
    Work experience placement
    Work at office
    Local area
    Worldwide
    2 days per week
    3 days per week

    CrowdStrike

    Sunnyvale, CA
    3 days ago
  • $146.5k

     ...their best performance, while committing...  ...: The ML Data Engineering team powers...  ..., and distributed systems, collaborating...  ...a Senior Software Engineer...  ...design and optimize large-scale...  ...running on AWS, supporting...  ...generative AI and metadata...  ...education or training; and other business... 
    Amazon Web Service
    Senior
    Training
    Performance
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    2 days ago
  • $108.8k - $191.82k

     ...seeking a Senior AI Data Engineer to support mission...  ...advanced AI/ML capabilities in highly...  ...databases • Optimize performance and reliability of...  ...Experience with distributed data processing frameworks...  ...(e.g., AWS GovCloud, Azure Government...  ..., education/ training, key skills as... 
    Amazon Web Service
    Senior
    Training
    Performance
    Full time
    Temporary work
    Work experience placement
    Work at office
    Remote work
    Relocation
    Flexible hours
    Shift work
    3 days per week

    Lockheed Martin Corporation

    Fort Worth, TX
    3 days ago
  • $165.2k - $223.6k

     ...Description AWS Neuron is the complete software stack for...  ...Development Engineer for the Neuron...  ...high-performance monitoring and...  ...applications and AI accelerators...  ...in optimizing AI workloads...  ...performance of ML Kernels and...  ...massive-scale distributed training and inference... 
    Amazon Web Service
    Training
    Performance
    Internship
    Local area
    Work from home
    Flexible hours

    Amazon

    Cupertino, CA
    2 days ago
  • $193.3k - $261.5k

     ...AWS Neuron is the software stack powering AWS Inferentia and...  ...to deliver high-performance, low-cost inference...  ...Software Development Engineer to lead and...  ...scale generative AI applications....  ...lead the design of distributed ML serving systems optimized for generative AI... 
    Amazon Web Service
    Performance
    Internship
    Local area
    Flexible hours

    Amazon

    Cupertino, CA
    1 day ago
  • $165k - $241.4k

     ...Senior AI/ML Engineer This is a hybrid position...  ...across the software stack, specializing...  ...kernel, integrate and optimize Yocto layers, and...  ...cloud (Azure/AWS/GCP), Docker/Kubernetes...  ..., and/or training. The full salary...  ...sales plans earn performance-based incentive pay... 
    Amazon Web Service
    Senior
    Training
    Performance
    Full time
    Temporary work
    Work at office
    Local area
    Flexible hours
    3 days per week

    Webex Events (formerly Socio)

    Milpitas, CA
    4 days ago
  • $132k - $150k

     ...time economic optimization and AI prediction to...  ...As a Senior Software Engineer - Backend &...  ...databases, and distributed backend...  ...while preserving performance Evaluate and...  ...Support training and deployment...  ...TigerData, InfluxDB, AWS, and GitHub-...  ...AI/ML systems in production... 
    Amazon Web Service
    Senior
    Training
    Performance
    Work experience placement
    Live in
    Work at office
    Visa sponsorship
    Flexible hours

    CVector - Industrial AI

    New York, NY
    3 days ago
  •  ...national distribution network....  ...We are a performance-driven, data...  ...This Role Is AI-Forward...  ...Every engineer uses AI coding...  ...architecture — optimizing context...  ...current with AI/ML tooling,...  ...in software architecture...  ...services (AWS, Azure and...  ...compensation, training, and... 
    Amazon Web Service
    Senior
    Training
    Performance
    Temporary work
    Local area
    Worldwide

    CarParts.com

    Long Beach, CA
    1 day ago
  • $146.5k - $228k

     ...the team: The ML Data Engineering team powers...  ..., and distributed systems, collaborating...  ...a Senior Software Engineer...  ...design and optimize large-scale...  ...running on AWS, supporting...  ...generative AI and metadata...  ...scalability, high performance, and rapid iteration...  ...or training; and other business... 
    Amazon Web Service
    Senior
    Training
    Performance
    Temporary work
    Local area
    Worldwide
    Home office
    Flexible hours

    Scribd

    San Francisco, CA
    3 days ago
  •  ...scientists and ML engineers • Develop...  ...model training, experimentation...  ...across the AI/ML lifecycle...  ...and strong software engineering...  ...Build SDKs optimized for notebook...  ...linting, and performance optimization...  ...maintaining, and distributing Python...  ...deploying models to AWS, GCP, or... 
    Amazon Web Service
    Senior
    Training
    Performance
    Contract work

    Purple Drive

    Alpharetta, GA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization. Be the first to apply!