Software Development Engineer, ML Infrastructure Team
Amazon
Software Development Engineer II
Want to help drive the success of Machine Learning technologies at AWS? We seek a Software Development Engineer II for the ML Infrastructure team to build the platforms that guarantee top performance of AWS ML and HPC technologies. Join us as we expand the AWS offerings for AI, including Trainium, Neuron and the Elastic Fabric Adapter (EFA). You'll build CI/CD systems, orchestrate GPU clusters, create performance dashboards, and develop AI-powered automation - all to ensure latest ML networking software ships with confidence.
Key job responsibilities:
- Build and maintain infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale across multiple GPU instance types.
- Use Jenkins, internal Amazon CI/CD tools, Linux, and public AWS products to automate testing and delivery of ML networking libraries - including collective communication frameworks, network transport layers, and GPU communication libraries.
- Write Python code that orchestrates large clusters, runs benchmarks and ML applications across a matrix of instance types, operating systems, and software stack versions.
- Use AWS Managed Grafana and Athena to digest performance data and build dashboards that catch functional and performance regressions before they reach customers.
- Build automation using LLMs to analyze test failures and surface actionable insights to developers.
- Contribute to cross-team readiness for new instance type launches by delivering performance data that shapes go/no-go decisions.
- Manage the complexity of infrastructure covering many instance types, software stacks, Linux operating systems, and latest releases and make it easy to evolve.
A day in the life:
- You write Python to orchestrate test workloads across large GPU clusters and TypeScript with CDK to ensure all infrastructure is code, reviewed and committed to automated pipelines.
- You manage shared development clusters using SLURM and AWS ParallelCluster, supporting multiple teams while keeping costs down.
- You build automation that analyzes nightly test results and surfaces regressions to developers.
- You write crisp designs for your projects, communicating clearly to your peers what you will build and why.
About the team:
We are part of Annapurna Labs, a subsidiary in AWS that builds software and hardware that make ML on EC2 work. Our organization is a dedicated group of innovators that have invented new networks, new silicon, new software suites, and combined those to entice customers to move immense ML and HPC workloads to the cloud. The ML Infrastructure team is laser focused on making AWS the best and most cost-effective place for customers to do AI at scale. Our work directly enables the launch of new GPU instance types.
$143.7k - $194.4k
...work with Principal Engineers and Applied... ..., experience with ML frameworks (TensorFlow... ...with model serving infrastructure is crucial for this... ...adapt to new development environments and changing... ...As a Software Development Engineer... ...Experience (LPEX) team, you will play a pivotal...SuggestedInternshipWorldwideFlexible hoursNight shift$165.2k - $223.6k
...building a central pipeline of Software Development Engineer (SDE) talent for... ...Learning Healthcare AI Infrastructure Services Firmware Development... ...openings across AWS, including teams such as Twitch, AWS... ...Qualifications Knowledge of ML frameworks including JAX, PyTorch...SuggestedInternshipLocal areaFlexible hoursDay shift$110k - $230k
...inside out. As a Staff Software Engineer on our Underwriting Services Team , you'll play a... ...~ Lead the design, development, and production rollout... ...architecture, CI/CD, and infrastructure as code ~ Hands-on... ...Experience building AI/ML-powered products , including...SuggestedHourly payWork experience placementLocal areaFlexible hours- ...please .GEICO is seeking a Staff Engineer to join our Underwriting Services Team. In this role, you will build and... ...Experience** 6+ years of professional software development experience, with significant,... ...service. Experience building AI/ML products, conversational interfaces...SuggestedLocal areaFlexible hours
$143.7k - $194.4k
...line. The EC2 Provisioning team is the assembly line for EC2... ...this team. EC2 Provisioning engineers become subject matter experts... ...network built to support modern ML platforms. We take pride in... ...the full lifecycle of software development, including requirements, system...SuggestedInternshipFlexible hours$168.1k - $227.4k
...providers? The Matching team owns entity resolution... ...accelerating your own development and as a core component... ...boundaries of how LLMs and ML models can improve... ...excellence for a team of engineers - Championing AI-assisted... ...professional software development experience...InternshipWorldwideFlexible hours$157.3k - $212.8k
...As a Cloud Hardware Development Engineer, you will be an end-to... ...and/or accelerator (AI/ML/GPU) server platforms... ...knowledge of various teams to architect... ...hardware, firmware, software, and physical layers,... ...administration and cloud infrastructure ~ Experience in server...InternshipLocal areaFlexible hours$108k - $192k
...Description: About the Team The Tanzu Intelligent Applications... ...speed of innovation in AI/ML infrastructure are paramount. The Role... ...a versatile Full Stack Software Engineer who combines the robustness of enterprise backend development with modern frontend...Local area- ...fast. This is a founding team, these first hires will... ...conversation. You might come from engineering, consulting, product, or... ...and extend MCP server infrastructure - our federated Model... ...of 5 years experience in software engineering, AI/ML, or enterprise technology...Work experience placementLive inWork at officeLocal area
$183.1k - $274.7k
...curiosity, and build teams that learn from... ...Senior Identity Engineers are end-to-end... ...cloud and physical infrastructure. You will collaborate... ...and data science/ML, and directly... ...impactful work in Software Engineering, with... ...volumes grow. Development experience — Commercial...Full timeTemporary workWorldwide$144k - $180k
...Intelligence Platform Team At Remitly, we believe... ...and machine learning engines. You will design, build... ...source of truth. Your infrastructure will empower our fraud... ...mechanisms that enable rapid ML model deployment and... ...professional experience in software engineering, with...Work at officeWorldwideFlexible hours3 days per week$143.7k - $194.4k
...The Alameda team is responsible for shaping the future of how Control Planes... ...scalable for their customers. As a software development engineer on this team, you will leverage modern... ...automates the creation and maintenance of infrastructure and software for both the Control...InternshipFlexible hours- ...platform that enables teams at Apple to build... ...across hardware, software and service... ...full-stack software engineer who is passionate... ...tooling that makes ML practitioners more... ...contribute to the development of Apple-internal... ...closely with design, infrastructure, and ML engineering...
$148.2k - $300.96k
...Responsibilitie About the Team The Inference Infrastructure team is the creator and... ..., and are looking for engineers passionate about cloud-native... ...-efficient and secure ML platforms. - Collaborate... ...completed a PhD degree in Software Development, Computer Science, Computer...Temporary workLocal area$202.16k - $368.22k
...Responsibilities About the Team: Search is no longer... ...platform is the core engine behind the "shelf-... ...and optimize the core infrastructure that supports TikTok... ...- Core Search Engine Development: Design and implement... ...feature retrieval. - ML Infrastructure Collaboration...Temporary workLocal area$165k - $242k
...technology, tools, and teams that enables... ...CoreWeave combines superior infrastructure performance with deep... ...the role Senior engineers are area owners who lead... ...Go and/or Python software development. ~ Hands-on experience... ...Experience with AI/ML infrastructure and...Permanent employmentTemporary workCasual workWork at officeFlexible hours$150k - $200k
...Evertune AI Software Engineer Please note: At this time,... ...insights, so marketing teams can make faster,... ...shaping our cloud and infrastructure patterns, optimizing... ...generative AI, LLM, or ML-powered systems Experience... ...& benefits, and development opportunities at...Work at officeLocal areaVisa sponsorship3 days per week$168.1k - $227.4k
...AWS Infrastructure Services owns the design, planning, delivery, and operation... ...help. You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security... .... As a Senior Software Development Engineer on the team, you will be...InternshipImmediate startFlexible hours- ...Senior Machine Learning Engineer This posting will... ..., 3D scans etc.). Our team also owns and maintains... ...machine learning infrastructure and systems serving hundreds... ...You have 5+ years of software engineering experience... ...of the end-to-end ML lifecycle, including training...Remote work
$230k - $270k
...keep reading. About the team + role We are building... ...across Robinhood's products and infrastructure. We build and maintain... ...s Kafka Proxy As a Staff Software Engineer, you'll design, develop, and... ...engineers across product, data, and ML teams to scale our platform,...Work at officeFlexible hoursShift work3 days per week$143.7k - $194.4k
...et al. The AWS Partnerships team is responsible for external... ...focuses on building AI/ML applications to provide recommendations... ...-oriented, self-driven Development who can help us create the... ...looking for an experienced Software Development Engineer to help the team take...InternshipFlexible hours$165.2k - $223.6k
...Description As a Software Development Engineer on the Sponsored Products and Brands Off-Search team, you will help build hyper-scale online... ..., and machine learning infrastructure-leveraging Generative AI (GenAI... .... ~ Familiarity with NLP/ML frameworks (e.g., PyTorch, Hugging...InternshipSeasonal workLocal areaFlexible hours$168.1k - $227.4k
...Empower Prime Video teams to move faster with data. We're looking for a Senior Software Engineer to build AI-powered analytics tooling... ...are looking for a Sr. Software Development Engineer for the PVXT... ...explore via tools, dashboards, ML systems or APIs to automate actions...InternshipWorldwideFlexible hoursNight shift$123.71k - $173.2k
...collaboration, and inclusion. Join our team of problem solvers as we add... .... We're seeking a hands-on Software Engineer who is eager to build and... ...to: Contribute to the development and deployment of digital... ...emerging HR technology trends, AI/ML advancements, and digital...Permanent employmentTemporary workLocal area$137.3k - $251.8k
...encourage curiosity, and build teams that learn from one another.... ..., too). We take pride in our engineers being trust-builders,... ...programming language (Web and API development) You are proficient in JavaScript... .... You have knowledge of software development code editors:...Full timeTemporary workLocal areaWorldwide$296k
...Distinguished Engineer Expedia Group brands power global travel for... ...cool offices), and career development resources, all to fuel our employees... ...us. Introduction to the Team At Expedia Group, we are reinventing... ...Experience building AI/ML-driven or agent-enabled systems...Local areaFlexible hours$165.2k - $223.6k
...Description AWS Neuron is the complete software stack for the AWS Inferentia and... ...use them. As the Software Development Engineer for the Neuron Foundation Tools Team, you will be responsible for... ...behavior. Improving performance of ML Kernels and ML Frameworks. In...InternshipLocal areaWork from homeFlexible hours$193.3k - $261.5k
...systems, machine learning infrastructure, and science to bring... ...and build scalable ML infrastructure... ...excellence Raise the engineering bar through technical... ...internship professional software development experience ~5+... ...leading an engineering team ~ Experience with vLLM...InternshipLocal areaFlexible hours$168.1k - $227.4k
...interested in building AWS AI services that use ML and LLM technology? Amazon is growing, and we need Software Development Engineers who move fast, are capable of breaking down... ...an opportunity to work with an experienced team of engineers and scientists on an innovative...InternshipWorldwideFlexible hours$184.5k
...Senior Software Development - AI Engineer Our Technology Team partners with teams across Expedia Group to create innovative products, services, and tools to deliver... ...and risk posture. Build, deploy, and operate ML in production; partner closely with Data Science/ML...Local area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Software Development Engineer, ML Infrastructure Team. Be the first to apply!
- software engineer full time Seattle, WA
- facebook software engineer Seattle, WA
- startup software engineer Seattle, WA
- intermediate software engineer Seattle, WA
- rust software engineer Seattle, WA
- freelance software developer Seattle, WA
- work from home software developer Seattle, WA
- software developer Seattle, WA
- software development engineer aws Seattle, WA
- ngo software engineer Seattle, WA

