Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams

$157.3k - $212.8k

Amazon Locker

Description

As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms - from New Product Introduction (NPI) through fleet health in production. You own the full lifecycle: design, development, qualification, launch, and ongoing operational excellence of servers running at scale in the AWS fleet.

You will work closely with internal customers to understand their technical needs and business goals, leveraging your experience with server design and the knowledge of various teams to architect solutions we deploy at scale. To deliver your products, you will work with an interdisciplinary team of component, firmware, power, mechanical, electrical, test, qualification, manufacturing engineers, and lead our ODM (design and manufacturing partners) to bring these servers to the data center. After launch, you own the fleet - monitoring quality, driving reliability improvements, and ensuring servers continue to meet customer requirements throughout their

operational life.

This role demands deep technical curiosity and the willingness to jump in and personally solve the hardest problems. When a complex system failure occurs - whether during NPI qualification or in a production fleet of hundreds of thousands of servers - you roll up your sleeves, dive into the details across hardware, firmware, software, and physical layers, and drive to root cause. You don't wait for someone else to figure it out.

You will own end-to-end system reliability - proactively identifying deficiencies and driving toward zero-touch operations where automation detects, diagnoses, and resolves issues before customer impact. You will decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features, leading delivery yourself and through others in parallel.

This is a fast-paced, intellectually challenging position. You'll work with thought leaders in multiple technology areas, hold high standards for yourself and everyone you work with, and constantly look for ways to improve your products' performance, quality, and cost. We're changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.

Key job responsibilities

NPI - New Product Introduction

  • Own the end-to-end NPI lifecycle for storage and/or accelerator (AI/ML/GPU) server platforms - from architecture definition through design, qualification, manufacturing ramp, and launch

  • Lead technical solutions for complex server and rack system architectural challenges

  • Work with ODM/manufacturing partners to develop, validate, and manufacture server products at scale

  • Develop functional specifications, design verification plans, and test procedures

  • Drive qualification and readiness milestones, ensuring new platforms meet performance, reliability, and cost targets before fleet deployment

  • Identify and resolve technical risks early in the development cycle - don't let problems reach production

Fleet Health, Diagnostics & Automation

  • Own fleet health for the server platforms you launch - reliability doesn't end at ship

  • Design and implement predictive failure detection systems using telemetry, sensor data, error trending, and log correlation to identify hardware issues before they cause customer impact

  • Drive toward zero-touch operations - help build detection, diagnoses, and remediation of faults without human intervention

  • Debug complex system failures in time-sensitive settings - personally diving deep when the problem demands it

  • Perform root cause analysis correlating across firmware, kernel, driver, thermal, power, and physical layers

Systems Design & Technical Depth

  • Apply expertise across hardware, software, system design, x86 architecture, processes, and operations (compute, storage, network, GPU)

  • Design and implement solutions to address system-level issues at large scale

  • Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features

  • Collaborate with hardware, software, manufacturing, supply chain, and product management teams

Cross-Team Collaboration

  • Work closely with internal customers to ensure new server hardware meets data path and control path requirements

  • Identify early any potential problems onboarding new servers into customer ecosystems

  • Collaborate across Hardware Engineering, component, firmware, test, qualification, and integration teams

  • Partner with datacenter operations to close the loop between field failures and design improvements

A day in the life

Your day-to-day responsibilities include interfacing with internal and external customers to understand product requirements and facilitate system development on top of your server designs. You will learn operational challenges facing our existing fleet with the goal of improving the current customer experience and developing improved systems for future designs. You will work directly with vendors and ODM (manufacture partners) to scale your product. Some days you're reviewing a new platform design with your ODM; other days you're deep in logs and telemetry data chasing a failure mode across the fleet. You thrive

on that range.

Basic Qualifications

  • Experience in developing functional specifications, design verification plans and functional test procedures

  • Bachelor's degree or above in electrical engineering, computer engineering, or equivalent

  • Experience in English-language communication skills, both written and verbal

  • Experience with design & innovation and research & development

  • Knowledge of operating systems, hardware, storage, network, security, database administration and cloud infrastructure

  • Experience in server technologies such as, thermal, mechanical, power, and signal integrity

  • 5+ years of professional work (non-internship) experience

Preferred Qualifications

  • 5+ years of hardware design and validation of components, subsystems and systems experience

  • Experience in server technologies: board design, high-speed bus design and signal integrity, failure analysis, server components (CPU, GPU, SSDs, memory), BIOS, BMC, and networking

  • Experience developing and executing test procedures for mechanical or electrical systems/components

  • Experience working with ODMs/manufacturer through the product development and manufacturing lifecycle

  • Experience building predictive failure detection or proactive remediation systems at fleet scale

  • Experience with storage/compute/GPU/accelerator platforms including integration, diagnostics, or performance validation

  • Familiarity with PCIe topology, NVLink, NVMe, and accelerator interconnects

  • Experience with large-scale datacenter or cloud environments

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company's reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at .

USA, CA, Cupertino - 157,300.00 - 212,800.00 USD annually

USA, TX, Austin - 136,000.00 - 184,000.00 USD annually

USA, WA, Seattle - 136,000.00 - 184,000.00 USD annually

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams in Seattle, WA vacancy
  • $183k - $247.6k

     ...of Generative AI cloud at AWS? Do you...  ...scalability in AI/ML and HPC...  ...Amazon’s Simple Storage Service (S3) and...  ...ll support the development and management...  ...join a diverse team of software, hardware, and network engineers, supply chain specialists...  ...of accelerated servers. You will... 
    Suggested
    Local area
    Flexible hours

    Amazon.com Services LLC

    Seattle, WA
    6 hours ago
  • About Elastix AI We are building the next-gen...  ...stack to the inference engine and underlying cloud hardware. We believe in...  ...harmonized with our ML strategies, software...  ...interdisciplinary engineering team. Required...  ...employee learning & development. #J-18808-Ljbffr ElastixAI... 
    Suggested
    Work at office
    Flexible hours
    3 days per week

    ElastixAI INC.

    Seattle, WA
    5 days ago
  • $207k - $300k

    Staff Software Engineer, AI/ML, Cloud Identity and Access Management Infrastructure...  ...of experience in software development. 5 years of experience in...  ...role leading project teams and setting technical direction...  ..., networking and data storage, security, artificial intelligence... 
    Suggested
    Full time
    Temporary work
    Shift work

    Google Inc.

    Seattle, WA
    1 day ago
  •  ...0659471-3337 Summary Apple Cloud AI Platform powers the machine learning...  .... We are looking for a Product Engineer who is excited to work directly with Apple product teams and translate real-world AI...  ...full-stack engineering, data and ML platforms, backend systems and applied... 
    Suggested

    Apple

    Seattle, WA
    2 days ago
  • $160k - $220k

     ...Posted 685 days ago AI Development 3+ years $160K - $...  ...Build and deploy ML models that power our...  ...This Role Join our ML Engineering team to build the brain behind...  ...~ Experience with cloud platforms (AWS, GCP, Azure...  ...budget. Top‑tier hardware and equipment. Conference... 
    Suggested
    Full time
    Remote work
    Flexible hours

    Odins3, Inc.

    Seattle, WA
    1 day ago
  • $207k - $300k

    A leading technology company is seeking a Staff Software Engineer in AI/ML for its Cloud Identity and Access Management team. This role requires extensive experience in software development, technical leadership, and cloud computing, focusing on building scalable infrastructure... 

    Google Inc.

    Seattle, WA
    1 day ago
  •  ...to reinvent AI inference...  ...and custom hardware. Our philosophy...  ...Software Engineer to own and evolve the cloud and...  ...closely with our ML, runtime, and...  ...hardware teams to expose the...  ..., storage, IAM, and observability...  ...inference servers and...  ...learning & development. #J-18808-... 
    Work at office
    Flexible hours
    3 days per week

    ElastixAI Inc.

    Seattle, WA
    13 days ago
  • $192k - $278k

     ...managing or leading a team. 5 years of...  ...managing technical work, engineering strategy, and...  ...of experience with hardware debug (silicon...  ...experience with leadership development and career growth...  ...of the Google Cloud Support team, you...  ...hardware, silicon, and AI/ML workloads to drive... 
    Permanent employment
    Full time
    Temporary work

    Google

    Kirkland, WA
    5 days ago
  • $171.6k - $302.2k

    Senior / Staff Software Engineer, Apple Cloud AI Platform Seattle, Washington, United...  ...directly with Apple product teams and translate real‑world AI...  ...stack engineering, data and ML platforms, backend systems...  ...including orchestration, storage, training/evaluation services... 
    Relocation package

    Apple Inc.

    Seattle, WA
    2 days ago
  • $174k - $252k

    Senior Software Engineer, Data Cloud Frontier AI Apply Benefits Health,...  ...experience with software development in one or more...  ...of experience with ML infrastructure (e.g....  ...networking and data storage, security, artificial...  ...opportunities to switch teams and projects as you... 
    Full time
    Temporary work
    Flexible hours

    Google Inc.

    Seattle, WA
    2 days ago
  • $148.2k - $300.96k

     ...Cloud Acceleration Engineer – DPU & AI Infra Location: Seattle Team: Infrastructure Employment Type...  ...architecture, development, and research of...  ...generation software-hardware technologies...  ...networking, and storage for cloud and AI...  ...scheduling for AI/ML workloads We... 
    Temporary work
    Local area

    ByteDance

    Seattle, WA
    3 days ago
  •  ...At Goldman Sachs, our Engineers don’t just make things...  .... Join our engineering teams that build massively scalable...  ...at Goldman Sachs, the Cloud Platform team is...  ...customer‑centric product development. They will work closely...  ...in leveraging Developer AI tools. About Goldman Sachs... 
    Full time
    Work experience placement
    Work at office

    Goldman Sachs Group, Inc.

    Seattle, WA
    2 days ago
  •  ...At Goldman Sachs, our Engineers don’t just make things...  .... Join our engineering teams that build massively scalable...  ...at Goldman Sachs, the Cloud Platform team is...  ...customer‑centric product development. They will work closely...  ...in leveraging Developer AI tools. About Goldman Sachs... 
    Full time
    Temporary work
    Work experience placement
    Work at office

    Goldman Sachs Bank AG

    Seattle, WA
    3 days ago
  • $262k - $365k

    Senior Staff Software AI Engineer, Data Cloud Frontier AI In...  ...experience in software development. 7 years of experience...  ...technical project strategy, ML design, and...  ...role leading project teams and setting technical...  ...networking and data storage, security, artificial... 
    Full time
    Temporary work
    Immediate start
    Flexible hours

    Google Inc.

    Seattle, WA
    2 days ago
  •  ...Net+Azure Developer(.NET+Azure+AI) Location:...  ...Collaborate with cross-functional teams to integrate tooling and...  .../ .NET Core for backend development Hands on...  ...Knowledge of AI/ML concepts or AI Agents (LLMs...  ...microservices architecture and cloud-native design patterns... 
    Full time

    AceStack LLC

    Bellevue, WA
    1 day ago
  • $183k - $247.6k

     ...Amazon Web Services (AWS) Hardware Engineering is a leading-edge product development team that creates enterprise compute and storage server designs for our innovative web service and e-commerce...  ...to build the next generation of our cloud platforms. Our success depends on our... 
    Local area
    Overseas
    Flexible hours

    Amazon

    Seattle, WA
    2 days ago
  •  ...company in Washington is seeking a Machine Learning Engineer to design, build, and operationalize scalable AI solutions. The role involves model development and MLOps engineering, requiring expertise in Python and various ML frameworks. Applicants should have over 5 years... 
    Full time

    Indev

    Seattle, WA
    2 days ago
  • $163k - $237k

    Google is seeking a Systems Debug Engineer based in Kirkland, WA, to oversee and optimize systems for cloud operations. You will manage...  ...on a large scale, troubleshoot AI and ML workloads, and partner closely with Product and SRE teams to ensure operational excellence... 

    Google

    Kirkland, WA
    3 days ago
  • A leading robotics firm is seeking an AI/ML Solutions Architect to design and implement comprehensive AI and machine...  ...adoption using Databricks, ensuring secure deployment via cloud platforms, and mentoring junior team members to enhance their skills. This role offers an... 

    Robotics Prcocess Automation, LLC

    Seattle, WA
    3 days ago
  • Workday, Inc. is seeking a Principal Machine Learning Engineer in Seattle to join the AI-driven Evisort team. In this dynamic role, you will develop advanced ML applications, utilizing vast datasets and cloud technologies. You will collaborate with engineers, driving innovations... 
    Contract work

    Workday, Inc.

    Seattle, WA
    2 days ago
  • $168k - $230k

     ...you become part of a team of passionate...  ...looking for a Principal Engineer to serve as the...  ...technical expert for our AI Platform &...  ...tracking and CI/CD for ML. Set the Gold Standard...  ...next generation of cloud services and ML...  ...Focus on development: Access to a learning... 
    Full time
    Flexible hours

    Serko Ltd

    Seattle, WA
    a month ago
  •  ...Inc. is seeking a Senior Software Engineer specializing in Data Cloud Frontier AI, located in Kirkland, WA. In this...  ...work with advanced technologies in ML and GenAI, contributing to the evolution...  ...experience in software development and machine learning, with a focus... 

    Google Inc.

    Kirkland, WA
    3 days ago
  •  ...2-3337 Summary AI systems are only as...  ...foundational science. Our team, part of Apple Services Engineering, is building that...  ..., validity theory), ML researchers, and...  ...services deployed in the cloud. While past...  ...experience with cloud-native development and deployment:... 
    Local area

    Apple

    Seattle, WA
    1 day ago
  • $206.9k - $279.9k

     ...custom kernel development and...  ...Product Management team, driving innovation...  ...-in-class ML performance in the cloud. You will lead...  ..., and hardware acceleration....  ...and influence engineering discussions around...  ...Compute, Database, Storage, Platform,...  ...generative AI services and... 
    Flexible hours

    Amazon

    Seattle, WA
    4 days ago
  •  ...Vice President Applied AI/ML Scientist As part of the Commercial...  ...within our payment solutions team, you will be instrumental in utilizing...  .../Lab, SQL, PySpark, and AWS Cloud Services is required....  ...Extensive knowledge in the design and development of agentic systems and the... 

    Chase

    Seattle, WA
    8 hours ago
  • A pioneering AI startup is seeking a hands-on Hardware Design Engineer to contribute to the AI inference engine's design. You will engage in deep technical work,...  ...hardware designs. The role requires collaboration with ML and software engineers and focuses on high-... 
    Flexible hours

    ElastixAI INC.

    Seattle, WA
    3 days ago
  • A leading tech company in Seattle is seeking a Senior Software Engineer to develop AI/ML solutions on GCP. Responsibilities include writing and testing code, conducting design reviews, and implementing Generative AI applications. Candidates should have a Bachelor's degree... 

    Google Inc.

    Seattle, WA
    2 days ago
  •  ...WA for a technology role focused on AI and application development. Candidates should have at least 8 years...  ..., C#, or Java and familiarity with cloud platforms like Azure. The role involves...  ..., and collaborating with various teams. Comprehensive benefits include medical... 

    Tata Consultancy Services Limited

    Seattle, WA
    2 days ago
  •  ...ll be a part of a team that creates...  ...a Senior Data & AI Platform Cloud Operations Developer...  ...developers, data engineers, product owners,...  ...3 years Software Development experience Experience...  ...database and storage systems (e.g. data...  ...supporting an AI or ML development team... 
    Permanent employment
    Relocation
    Visa sponsorship
    Work visa
    Relocation package
    Flexible hours
    Shift work
    Day shift

    The Boeing Company

    Seattle, WA
    1 day ago
  • $190k - $230k

     ...Platform unites agentic AI solutions with the...  ...across code, cloud, and AI systems. Through...  ...pentesting, AI red teaming, and code security,...  ...Senior Software Engineer, AI Platform Location...  ...directly to the development of our next-generation...  ...-grade AI or ML systems, including... 
    Apprenticeship
    Work at office
    Local area
    Remote work
    Flexible hours
    Shift work
    1 day per week

    HackerOne

    Seattle, WA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams. Be the first to apply!