Principal Reliability Scientist
Graphcore
Principal Reliability Scientist
Graphcore is one of the world's leading innovators in Artificial Intelligence compute. It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.
As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world's most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.
Graphcore's teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.
Job Summary
Reporting to the Quality leadership within Manufacturing Operations, the Senior Reliability Scientist is responsible for leading reliability activities across complex, high-performance systems. Working closely with established reliability experts and cross-functional teams, this role uses experimental data and advanced modelling to inform design decisions, validate product reliability and optimise serviceability strategies, including spares provisioning.
The Team
The Quality team within Manufacturing Operations is responsible for ensuring product robustness, reliability and lifecycle performance across Graphcore's hardware portfolio. The team includes experienced reliability specialists and works closely with technology research, chip, board, system design, platform and operations teams to translate reliability insights into actionable improvements across the product lifecycle.
Responsibilities and Duties:
· Define and refine reliability requirements across silicon, board and system levels, working in partnership with research and design teams
· Apply advanced reliability methodologies to highly innovative systems, including challenges associated with liquid-cooled architectures and fluid dynamics
· Design and execute experiments to generate high-quality reliability and performance data, ensuring statistical rigour and relevance
· Analyse experimental, field and manufacturing data to quantify reliability metrics such as MTBF, MTTR, RAS characteristics and soft error rates (SER)
· Use data-driven insights to inform product design trade-offs, reliability targets and spares provisioning strategies
· Collaborate with chip, board and system design teams to influence architecture and component selection based on reliability considerations
· Support development of system-level reliability models incorporating thermal, mechanical and fluid behaviour
· Lead complex root cause investigations into reliability issues, driving corrective and preventative actions across teams
· Contribute to the evolution of reliability tools, processes and best practices within the organisation
· Communicate complex reliability concepts, risks and recommendations clearly to a wide range of stakeholders
Qualifications:
- Strong background in reliability engineering or reliability science within semiconductor, hardware or complex systems environments
- Experience of physics-of-failure approaches in high-performance computing, AI hardware or related domains
- Experience with reliability modelling, experimental design and statistical data analysis
- Proven ability to work with and interpret experimental reliability data to drive engineering decisions
- Experience with key reliability metrics such as MTBF, MTTR, RAS and failure rate analysis
- Ability to operate effectively in complex, cross-functional environments with multiple stakeholders
- Strong problem-solving skills with the ability to lead technically challenging investigations independently
- Excellent communication skills, with the ability to influence design and operations teams using data-driven insights
Preferred Qualification:
· Experience with liquid cooling systems, fluid dynamics or thermally complex hardware environments
· Knowledge of soft error mechanisms and SER modeling
· Experience contributing to reliability strategy, processes or tooling improvements
- ...Principal Analytical Chemist (Milpitas, CA) MAJOR DUTIES AND RESPONSIBILITIES Responsibilities will include, but not be limited to: Initiate, direct, and execute all scientific research and/or development strategies in research and/or development Investigate the feasibility...Principal
$142.8k - $274.8k
...experimentation. We transform sparse engagement signals into reliable learning targets and build models that remain robust under... ...bias, and rapidly shifting marketplace dynamics. As a Principal Applied Scientist, you will help define the future of data-driven...PrincipalOngoing contractWork at officeLocal areaImmediate startShift work$269.4k - $412.6k
...shaping the future of transportation on a global scale. Role As a Principal Technical Lead in Trajectory Generation within the Embodied AI... ...designing, architecting, and deploying advanced ML models to reliably and safely navigate diverse real-world scenarios and...PrincipalRelocation packageFlexible hours$142.8k - $274.8k
...to one of the highest-scale experimentation platforms on the planet Collaborate closely with backend engineers, data scientists, site reliability engineers (SREs), and product managers to gather requirements, iterate on features, and deliver seamless, end-to-end user...PrincipalOngoing contractLocal area$200k - $240k
...A leading semiconductor firm is seeking a Principal Engineer for Photonics Reliability in San Jose, CA. The role involves conducting detailed failure analysis investigations and implementing corrective actions to enhance product reliability. Candidates should have a Master...Principal- ...Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage San Jose, California, United States About the Job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage Qualifications: SPIFFE - Experience SPIRE - Experience Multiple...Principal
$117k - $201.25k
...talents will help patients on their journey to wellness. Learn more at We are searching for the best talent for a Principal Clinical Research Scientist - Surgical Implants Vision platform. The role is based on either the Irvine, CA or Milpitas, CA campuses. Purpose The...PrincipalWork experience placementLocal areaImmediate start- A leading cybersecurity firm in Santa Clara is seeking a Principal Site Reliability Engineer to design and optimize their cloud platforms. The successful candidate will lead automation strategies, enhance system reliability, and mentor teams in best practices. This role...Principal
$103.6k - $155.4k
Northrop Grumman Corp. (JP) is looking for a Principal Engineer, Reliability in Sunnyvale, CA. This role involves leading reliability engineering initiatives to improve uptime and reduce the risk of failures in submarine components testing. The position requires a Bachelor...Principal$228.7k - $309.4k
...will set the technical targets that propagate across the model, compiler, runtime, and silicon stack. We are hiring a Principal Applied Scientist to be the technical leader who closes the loop between compression science and silicon design. Today's generation ships advanced...PrincipalLive inLocal areaFlexible hours$188k - $301k
...Co-Packaged Optics (CPO) Reliability Sr. Staff/Principal Engineer Category: Chip Design Location: San Jose, CA Experience: More than 8 Years Work Expe. Job Description We are seeking a high-caliber Co-Packaged Optics (CPO) Reliability Engineer to lead the reliability,...PrincipalContract workTemporary work$134k - $231.15k
...Principal Clinical Scientist – Robotics & Digital Solutions Location: Santa Clara, California or Cincinnati, Ohio. Domestic travel up to 25% may be required. Job Description Position within Johnson & Johnson MedTech Surgery, responsible for scientific leadership of the...PrincipalLocal areaImmediate start$211k - $338k
...founded on. So, if you're ready to seize the endless opportunities and leave your mark, come join us. THE ROLE Everpure is seeking a Principal Economist to drive go-to-market (GTM) efficiency by leading our causal measurement science vision. You will lead the development...PrincipalWork at officeFlexible hours$134k - $231.15k
...unique talents will help patients on their journey to wellness. Learn more at We are searching for the best talent for a Principal Clinical Scientist – Robotics & Digital Solutions, located in either Santa Clara, California or Cincinnati, Ohio. The role may require up...PrincipalLocal areaImmediate start$206.3k - $388k
...graphs, multimodal embeddings, LLMs, and ranking systems to deeply understand intent and creative content. We're hiring a Principal Scientist to define how structured knowledge and foundation models work together at scale. You will architect next-generation graph-...PrincipalTemporary workLocal areaWorldwide- ...precision that drives great outcomes. Job Summary Note: This role requires US Citizenship. Your Career As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You aren't just managing servers...PrincipalFull timeWork at officeVisa sponsorshipWork visaShift work
$206.3k - $388k
...ABOUT THE ROLE We're looking for a Principal Scientist (P60) to shape the data strategy behind Adobe Firefly's multimodal foundation models (image, video, audio). In this role, you'll work across research and engineering to improve how large-scale visual data is sourced...PrincipalTemporary workLocal areaWorldwide$180k - $250k
...ThinkIQ in San Jose, California is looking for a Sr. Staff/Principal Engineer for its ALC architecture team. This role involves integrating optical, photonic, electronic, and mechanical components into optical datacom products. Candidates should have over 10 years of optical...Principal$151.6k - $245.3k
...infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services... ...in new technologies Work with developers, researchers, data scientists, and security experts Design, build and operate reliable,...Principal$103.6k - $155.4k
...only part of history, they’re making history. Northrop Grumman Mission Systems is looking for you to join our team as a Principal Engineer, Reliability, based out of Sunnyvale, CA. The Principal Engineer, Reliability, is the key role to support the critical steam...PrincipalRelocation package- ...robust, and cost‑competitive laser devices for their 800G and next‑gen 1.6T datacom transceiver designs. Our world‑class and highly reliable InP technology platform is one of the very few in the industry that has been proven, with more than one hundred million lasers in...PrincipalFull timeWork at officeRemote workLong distance
$202k - $247k
...internal tooling, APIs, and frameworks which streamline our workflows and automate our infrastructure. About this role: As a Principal Site Reliability Engineer at FortiCNAPP, you will lead the design, implementation, and optimization of our highly scalable, resilient, and...PrincipalFull timeWorldwide$151.6k - $245.3k
...infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services... ...new technologies Work with developers, researchers, data scientists, and security experts Design, build and operate reliable,...PrincipalFull timeWork at officeVisa sponsorshipWork visa$151.6k - $245.3k
...infrastructure and is one of the largest GCP customers. As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience... ...collaborating closely with developers, researchers, and data scientists. Develop tools and automation frameworks that champion...PrincipalFull timeWork at officeVisa sponsorshipWork visa$260k - $275k
...SENIOR PRINCIPAL SOFTWARE ENGINEER Saviynt is an identity platform built to power and protect the world at work. With the rise of... ...improvement in engineering processes, tooling, and operational reliability. Collaborate with internal teams to produce software design...Principal$162.7k - $263.18k
...Your Career As a Sr Principal Security Researcher, you will work at the forefront of AI-assisted vulnerability research, focusing on... ...engineering, exploitability analysis, and security automation to build reliable workflows for vulnerability discovery, PoC generation, finding...Principal- ...nEye Systems, Inc. in Santa Clara is seeking a Senior Materials Scientist to improve optical performance in semiconductor fabrication... ...analyze material interfaces and lead initiatives to enhance device reliability and yield while collaborating closely with design and process...
- ...Principal System Debug Engineer Austin, Texas, United States; US - Milpitas Position Overview We are seeking a senior technical... ...CPU or x86 architectures, SoC design, memory subsystems, RAS (Reliability, Availability, and Serviceability), and power management....PrincipalFlexible hours
- ...other hardware systems Advise design engineering on selection, application and test of electronic components and systems Determine reliability requirements of components and systems to achieve company, customer and any governmental agency reliability objectives Make...Principal
- ...A leading travel technology company is seeking a Principal Software Development Engineer to join their Runtime Team in San Jose. The... ...runtime platform, mentoring engineers, and ensuring high system reliability. Candidates should have at least 8 years of experience with infrastructure...PrincipalFull time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Reliability Scientist. Be the first to apply!
- lab scientist Milpitas, CA
- research scientist - biology Milpitas, CA
- senior principal scientist Milpitas, CA
- drug safety scientist Milpitas, CA
- machine learning scientist Milpitas, CA
- scientist immunology Milpitas, CA
- support scientist Milpitas, CA
- water quality scientist Milpitas, CA
- scientist Milpitas, CA
- safety scientist Milpitas, CA


