Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams
$157.3k - $212.8kAmazon
Application deadline: Jun 6, 2026
As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms - from New Product Introduction (NPI) through fleet health in production. You own the full lifecycle: design, development, qualification, launch, and ongoing operational excellence of servers running at scale in the AWS fleet. You will work closely with internal customers to understand their technical needs and business goals, leveraging your experience with server design and the knowledge of various teams to architect solutions we deploy at scale. To deliver your products, you will work with an interdisciplinary team of component, firmware, power, mechanical, electrical, test, qualification, manufacturing engineers, and lead our ODM (design and manufacturing partners) to bring these servers to the data center. After launch, you own the fleet - monitoring quality, driving reliability improvements, and ensuring servers continue to meet customer requirements throughout their operational life. This role demands deep technical curiosity and the willingness to jump in and personally solve the hardest problems. When a complex system failure occurs - whether during NPI qualification or in a production fleet of hundreds of thousands of servers - you roll up your sleeves, dive into the details across hardware, firmware, software, and physical layers, and drive to root cause. You don't wait for someone else to figure it out. You will own end-to-end system reliability - proactively identifying deficiencies and driving toward zero-touch operations where automation detects, diagnoses, and resolves issues before customer impact. You will decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features, leading delivery yourself and through others in parallel. This is a fast-paced, intellectually challenging position. You'll work with thought leaders in multiple technology areas, hold high standards for yourself and everyone you work with, and constantly look for ways to improve your products' performance, quality, and cost. We're changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today. Key job responsibilities NPI - New Product Introduction - Own the end-to-end NPI lifecycle for storage and/or accelerator (AI/ML/GPU) server platforms - from architecture definition through design, qualification, manufacturing ramp, and launch - Lead technical solutions for complex server and rack system architectural challenges - Work with ODM/manufacturing partners to develop, validate, and manufacture server products at scale - Develop functional specifications, design verification plans, and test procedures - Drive qualification and readiness milestones, ensuring new platforms meet performance, reliability, and cost targets before fleet deployment - Identify and resolve technical risks early in the development cycle - don't let problems reach production Fleet Health, Diagnostics & Automation - Own fleet health for the server platforms you launch - reliability doesn't end at ship - Design and implement predictive failure detection systems using telemetry, sensor data, error trending, and log correlation to identify hardware issues before they cause customer impact - Drive toward zero-touch operations - help build detection, diagnoses, and remediation of faults without human intervention - Debug complex system failures in time-sensitive settings - personally diving deep when the problem demands it - Perform root cause analysis correlating across firmware, kernel, driver, thermal, power, and physical layers Systems Design & Technical Depth - Apply expertise across hardware, software, system design, x86 architecture, processes, and operations (compute, storage, network, GPU) - Design and implement solutions to address system-level issues at large scale - Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features - Collaborate with hardware, software, manufacturing, supply chain, and product management teams Cross-Team Collaboration - Work closely with internal customers to ensure new server hardware meets data path and control path requirements - Identify early any potential problems onboarding new servers into customer ecosystems - Collaborate across Hardware Engineering, component, firmware, test, qualification, and integration teams - Partner with datacenter operations to close the loop between field failures and design improvements A day in the life Your day-to-day responsibilities include interfacing with internal and external customers to understand product requirements and facilitate system development on top of your server designs. You will learn operational challenges facing our existing fleet with the goal of improving the current customer experience and developing improved systems for future designs. You will work directly with vendors and ODM (manufacture partners) to scale your product. Some days you're reviewing a new platform design with your ODM; other days you're deep in logs and telemetry data chasing a failure mode across the fleet. You thrive on that range. BASIC QUALIFICATIONS - Experience in developing functional specifications, design verification plans and functional test procedures - Bachelor's degree or above in electrical engineering, computer engineering, or equivalent - Experience in English-language communication skills, both written and verbal - Experience with design & innovation and research & development - Knowledge of operating systems, hardware, storage, network, security, database administration and cloud infrastructure - Experience in server technologies such as, thermal, mechanical, power, and signal integrity - 5+ years of professional work (non-internship) experience PREFERRED QUALIFICATIONS - 5+ years of hardware design and validation of components, subsystems and systems experience - Experience in server technologies: board design, high-speed bus design and signal integrity, failure analysis, server components (CPU, GPU, SSDs, memory), BIOS, BMC, and networking - Experience developing and executing test procedures for mechanical or electrical systems/components - Experience working with ODMs/manufacturer through the product development and manufacturing lifecycle - Experience building predictive failure detection or proactive remediation systems at fleet scale - Experience with storage/compute/GPU/accelerator platforms including integration, diagnostics, or performance validation - Familiarity with PCIe topology, NVLink, NVMe, and accelerator interconnects - Experience with large-scale datacenter or cloud environments Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company's reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner. The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at USA, CA, Cupertino - 157,300.00 - 212,800.00 USD annually USA, WA, Seattle - 136,000.00 - 184,000.00 USD annuallyVacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams in Cupertino, CA vacancy
$183k - $247.6k
...and Unix systems engineering to deliver... ...people who keep the cloud running. We... ...and all of the servers, storage, networking, power... ...join a diverse team of software, hardware, and network... ...Network Product Development (NPD) team. As... ...protocols and AI/ML scale up approaches...SuggestedLocal areaFlexible hours$160k - $240k
...AI Cloud Infrastructure Engineer - Fury Team Sunnyvale, CA The future of defense will be decided by those who... ...remediate bottlenecks in compute, memory, storage, and network performance to... ...Qualifications ~3+ years of experience in ML infrastructure, MLOps, or large-...SuggestedFull timeRelocation package$171k - $231.4k
...people who keep the cloud running. We... ...centers and all of the servers, storage, networking,... ...ll join a diverse team of software, hardware, and network engineers, supply chain specialists... ...the design and development of server... ...as compute, gpu (AI/ML), or storage servers...SuggestedLocal areaFlexible hours$207k - $300k
Google Inc. is seeking a Software Engineering Manager II for AI/ML within Google Cloud AI in Sunnyvale, CA. This role involves technical leadership in machine... ...and infrastructure optimization. You will lead teams in developing innovative solutions and setting team...Suggested- ...skilled Backend Software Engineer to join our platform engineering team focused on building scalable... ...within our internal cloud ecosystem. This role sits... ...integrate with internal data and AI platforms.Collaborate with... ...or FlyteExposure to ML lifecycle workflows (training...SuggestedRemote work
$207k - $300k
...Inc. is seeking a Staff Software Engineer to join the Data Cloud Frontier AI team in Sunnyvale, CA. This role involves... ..., designing and optimizing ML infrastructure, and coordinating across... ...have a strong background in software development, with at least 8 years of...Full time- ...company in Sunnyvale, CA is seeking a Staff Software Engineer focusing on AI/ML for their Cloud AI team. This role involves designing and optimizing large-... ...will have extensive experience in software development and GenAI techniques, along with a strong educational...
$131k - $175k
...Senior Hardware Systems Engineer – AI Rack & Cluster Infrastructure Arista Networks... ...leader in data-driven, client-to-cloud networking for large data... ..., such as Best Engineering Team, Best Company for Diversity,... ...hyperscalers or large-scale AI/ML cluster deployments...Remote workFlexible hours$140k - $215k
...Software Development Engineer As a global leader in cybersecurity... ...'s most advanced AI-native platform.... ...to the team who have limitless... ...and Response (AIDR) Cloud team. In this role... ...technologies Data storage systems:... ...understanding of AI/ML security challenges...Work experience placementWork at officeLocal areaWorldwide2 days per week3 days per week$141k - $202k
...with software development in C++. 2 years... ...technologies, storage or hardware architecture.... ...software test engineering. About the job... ...different stacks. The AI and Infrastructure team is redefining... ..., Google Cloud customers, and... ...gaps to help with ML stack maturation...Full timeWorldwide$207k - $300k
Staff Software Engineer, AI/ML, Google Cloud corporate_fare Google place Sunnyvale... ...of experience in software development. 5 years of experience with... ...role leading project teams and setting technical direction... ...future. From software to hardware our teams are shaping the...Full timeWorldwide- ...the world's largest AI chip, 56 times... ...effortlessly run large-scale ML applications,... ...GPU-based hyperscale cloud inference services.... .... About The Team Cerebras builds... ...for future Cerebras hardware and software. Our work... ...We are seeking R&D Engineers to join Cerebras'...
$120k - $172k
...technology company in California seeks a Product Quality Engineer for hardware within Google Cloud. This role involves owning the product quality process,... ...methods, and collaborating with cross-functional teams to ensure exceptional product performance. Candidates should...$197k - $291k
Staff Software Engineer, AI/ML, Google Cloud AI Apply info_outline info_outline... ...of experience in software development. 5 years of experience testing... ...role leading project teams and setting technical direction... ..., networking and data storage, security, artificial intelligence...Full timeTemporary work$212k - $318.4k
Senior Software Engineer - AI Observability - AI,... ...services, blending AI, cloud-first engineering,... ...Infrastructure Team within Apple’s... ...lead the design and development of user-facing observability... ...using LLM and ML models for AIOps... ..., object storage, networking, databases...Relocation package$110k - $140k
...Sr. AI Product Manager - Key Components... ...provider of advanced server, storage, and networking... ...for Data Center, Cloud Computing, Enterprise... ..., and committed engineers, technologists, and... ...to join our team. This role is crucial... ..., Engineering, AI/ML, or related field....Worldwide$207k - $300k
Staff Software Engineer, Data Cloud Frontier AI Apply Google Seattle, WA... ...experience in software development. 5 years of... ...experience leading ML design and optimizing... ...role leading project teams and setting technical... ...networking and data storage, security, artificial...Full timeTemporary workFlexible hours$210k - $295k
.... PRINCIPAL SOFTWARE ENGINEER (PLATFORM TEAM) The Platform Team builds... ...at SpaceX to harness AI effectively. This team... ...integrate with any cloud compute provider and... ...SpaceX production and development by making trustworthy... ...operating large-scale AI/ML platforms, internal...Permanent employmentTemporary work- ...robotics company in California seeks an ML Infrastructure Engineer to build and operate inference systems... ...performance, and collaborating with research teams. Candidates should have over 3 years in... ...robots to make a significant impact in robotics and AI. #J-18808-Ljbffr Rhoda AI
- ...is seeking an ASIC Design Engineer in Sunnyvale, CA. In this role, you will drive the development of cutting-edge TPU technology for AI/ML applications. You will need... .... Become part of a team pushing the boundaries in designing hardware for Google's services. #J-...
- ...company located in Sunnyvale, CA, is seeking an RTL Design Engineer to develop cutting-edge solutions for AI and machine learning acceleration. The role involves... ...ASICs and collaborating with cross-functional teams. Candidates should possess a Bachelor's degree in relevant...
- ...CrowdStrike, Inc. is seeking a Cloud Software Engineer to join the Falcon Complete AI Engineering Team in Sunnyvale, California. In this role, you will design, build... ...awards, and comprehensive wellness programs to support your professional development. #J-18808-Ljbffr...
- ...growing demand for AI development, training, and... ...Workstations & Servers, you will own... ...’s AI-focused hardware solutions, including... ...management, engineering, manufacturing,... ...regional sales teams to improve... ...understanding of AI/ML workloads (... ...system integrators, cloud providers, and...
- A leading technology company in California is looking for a Senior Software Engineer to develop cutting-edge AI and ML solutions. Responsibilities include writing and testing code, collaborating through design and code reviews, and contributing to documentation. Candidates...Full time
- ...America Inc. is seeking a Staff QA Engineer to lead QA strategy for AI and cloud-based applications in Palo Alto, California... ...and collaborating with engineering teams, ensuring high-quality releases.... ...skills, and a background in AI/ML testing. Uniphore offers...
$174k - $252k
Google Inc. is seeking a Senior Software Engineer for their AI/ML GenAI team in Mountain View, CA. The ideal candidate will have extensive programming experience with Python or C++, alongside strong ML infrastructure capabilities. This role demands collaboration with teams...$89.3k - $157.44k
Lockheed Martin is looking for a skilled AI/ML Machine Learning Engineer in Sunnyvale, California. The successful candidate will leverage advanced AI/ML techniques to solve complex problems in the Space domain. A Bachelor's degree in a STEM field is required, alongside...- ...in Sunnyvale, CA is seeking a Senior Staff Software Engineer to innovate within AI/ML and Google Cloud Applications. The role entails designing and... ...ideal candidate has extensive experience in software development and project leadership, and this role offers a significant...
$320k
...into the unlimited potential of AI to define the next era of... ...their best work. Come join the team and see how you can make a... ...lasting impact on the world. AI Cloud Data Storage NVIDIA DGXC Storage org... .... We seek a Distinguished Engineer to lead NVIDIA's storage strategy...Worldwide- ...Overview: Cloud Architect - AI/ML Design and optimize cloud-based AI/ML data systems.... ...scalability, automation, security, and cross-team collaboration. Key Skills:... ...Qualifications: ~5+ years in data engineering ~ Proven Azure cloud experience...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams. Be the first to apply!
Related searches
- senior principal cloud computing engineer Cupertino, CA
- senior cloud engineer Cupertino, CA
- senior aws cloud engineer Cupertino, CA
- aws cloud security engineer Cupertino, CA
- aws cloud architect Cupertino, CA
- cloud developer Cupertino, CA
- senior cloud network engineer Cupertino, CA
- senior cloud security engineer Cupertino, CA
- informatica cloud developer Cupertino, CA
- cloud architect Cupertino, CA


