Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Reliability Scientist

Graphcore

Principal Reliability Scientist

Graphcore is one of the world's leading innovators in Artificial Intelligence compute. It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.

As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world's most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.

Graphcore's teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.

Job Summary

Reporting to the Quality leadership within Manufacturing Operations, the Senior Reliability Scientist is responsible for leading reliability activities across complex, high-performance systems. Working closely with established reliability experts and cross-functional teams, this role uses experimental data and advanced modelling to inform design decisions, validate product reliability and optimise serviceability strategies, including spares provisioning.

The Team

The Quality team within Manufacturing Operations is responsible for ensuring product robustness, reliability and lifecycle performance across Graphcore's hardware portfolio. The team includes experienced reliability specialists and works closely with technology research, chip, board, system design, platform and operations teams to translate reliability insights into actionable improvements across the product lifecycle.

Responsibilities and Duties:

· Define and refine reliability requirements across silicon, board and system levels, working in partnership with research and design teams

· Apply advanced reliability methodologies to highly innovative systems, including challenges associated with liquid-cooled architectures and fluid dynamics

· Design and execute experiments to generate high-quality reliability and performance data, ensuring statistical rigour and relevance

· Analyse experimental, field and manufacturing data to quantify reliability metrics such as MTBF, MTTR, RAS characteristics and soft error rates (SER)

· Use data-driven insights to inform product design trade-offs, reliability targets and spares provisioning strategies

· Collaborate with chip, board and system design teams to influence architecture and component selection based on reliability considerations

· Support development of system-level reliability models incorporating thermal, mechanical and fluid behaviour

· Lead complex root cause investigations into reliability issues, driving corrective and preventative actions across teams

· Contribute to the evolution of reliability tools, processes and best practices within the organisation

· Communicate complex reliability concepts, risks and recommendations clearly to a wide range of stakeholders

Qualifications:
  • Strong background in reliability engineering or reliability science within semiconductor, hardware or complex systems environments
  • Experience of physics-of-failure approaches in high-performance computing, AI hardware or related domains
  • Experience with reliability modelling, experimental design and statistical data analysis
  • Proven ability to work with and interpret experimental reliability data to drive engineering decisions
  • Experience with key reliability metrics such as MTBF, MTTR, RAS and failure rate analysis
  • Ability to operate effectively in complex, cross-functional environments with multiple stakeholders
  • Strong problem-solving skills with the ability to lead technically challenging investigations independently
  • Excellent communication skills, with the ability to influence design and operations teams using data-driven insights

Preferred Qualification:

· Experience with liquid cooling systems, fluid dynamics or thermally complex hardware environments

· Knowledge of soft error mechanisms and SER modeling

· Experience contributing to reliability strategy, processes or tooling improvements

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Principal Reliability Scientist in Milpitas, CA vacancy
  •  ...Principal Analytical Chemist (Milpitas, CA) MAJOR DUTIES AND RESPONSIBILITIES Responsibilities will include, but not be limited to: Initiate, direct, and execute all scientific research and/or development strategies in research and/or development Investigate the feasibility... 
    Principal

    Cedent

    Milpitas, CA
    5 hours ago
  • $142.8k - $274.8k

     ...experimentation. We transform sparse engagement signals into reliable learning targets and build models that remain robust under...  ...bias, and rapidly shifting marketplace dynamics. As a Principal Applied Scientist, you will help define the future of data-driven... 
    Principal
    Ongoing contract
    Work at office
    Local area
    Immediate start
    Shift work

    Microsoft Corporation

    Sunnyvale, CA
    4 days ago
  • $269.4k - $412.6k

     ...shaping the future of transportation on a global scale. Role As a Principal Technical Lead in Trajectory Generation within the Embodied AI...  ...designing, architecting, and deploying advanced ML models to reliably and safely navigate diverse real-world scenarios and... 
    Principal
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    11 hours ago
  • $142.8k - $274.8k

     ...to one of the highest-scale experimentation platforms on the planet Collaborate closely with backend engineers, data scientists, site reliability engineers (SREs), and product managers to gather requirements, iterate on features, and deliver seamless, end-to-end user... 
    Principal
    Ongoing contract
    Local area

    Microsoft Corporation

    Mountain View, CA
    11 hours ago
  • $200k - $240k

     ...A leading semiconductor firm is seeking a Principal Engineer for Photonics Reliability in San Jose, CA. The role involves conducting detailed failure analysis investigations and implementing corrective actions to enhance product reliability. Candidates should have a Master... 
    Principal

    Ayar Labs

    San Jose, CA
    11 hours ago
  •  ...Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage San Jose, California, United States About the Job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage Qualifications: SPIFFE - Experience SPIRE - Experience Multiple... 
    Principal

    5 Star Global Recruitment Partners

    San Jose, CA
    1 day ago
  • $117k - $201.25k

     ...talents will help patients on their journey to wellness. Learn more at We are searching for the best talent for a Principal Clinical Research Scientist - Surgical Implants Vision platform. The role is based on either the Irvine, CA or Milpitas, CA campuses. Purpose The... 
    Principal
    Work experience placement
    Local area
    Immediate start

    Johnson & Johnson MedTech

    Milpitas, CA
    4 days ago
  • A leading cybersecurity firm in Santa Clara is seeking a Principal Site Reliability Engineer to design and optimize their cloud platforms. The successful candidate will lead automation strategies, enhance system reliability, and mentor teams in best practices. This role... 
    Principal

    Fortinet, Inc.

    Santa Clara, CA
    2 days ago
  • $103.6k - $155.4k

    Northrop Grumman Corp. (JP) is looking for a Principal Engineer, Reliability in Sunnyvale, CA. This role involves leading reliability engineering initiatives to improve uptime and reduce the risk of failures in submarine components testing. The position requires a Bachelor... 
    Principal

    Northrop Grumman Corp. (JP)

    Sunnyvale, CA
    1 day ago
  • $228.7k - $309.4k

     ...will set the technical targets that propagate across the model, compiler, runtime, and silicon stack. We are hiring a Principal Applied Scientist to be the technical leader who closes the loop between compression science and silicon design. Today's generation ships advanced... 
    Principal
    Live in
    Local area
    Flexible hours

    Amazon.com Services LLC

    Sunnyvale, CA
    8 hours ago
  • $188k - $301k

     ...Co-Packaged Optics (CPO) Reliability Sr. Staff/Principal Engineer Category: Chip Design Location: San Jose, CA Experience: More than 8 Years Work Expe. Job Description We are seeking a high-caliber Co-Packaged Optics (CPO) Reliability Engineer to lead the reliability,... 
    Principal
    Contract work
    Temporary work

    MediaTek Research Lab Inc.

    San Jose, CA
    1 day ago
  • $134k - $231.15k

     ...Principal Clinical Scientist – Robotics & Digital Solutions Location: Santa Clara, California or Cincinnati, Ohio. Domestic travel up to 25% may be required. Job Description Position within Johnson & Johnson MedTech Surgery, responsible for scientific leadership of the... 
    Principal
    Local area
    Immediate start

    6267-Auris Health Inc. Legal Entity

    Santa Clara, CA
    11 hours ago
  • $211k - $338k

     ...founded on. So, if you're ready to seize the endless opportunities and leave your mark, come join us. THE ROLE Everpure is seeking a Principal Economist to drive go-to-market (GTM) efficiency by leading our causal measurement science vision. You will lead the development... 
    Principal
    Work at office
    Flexible hours

    Pure Storage

    Santa Clara, CA
    3 days ago
  • $134k - $231.15k

     ...unique talents will help patients on their journey to wellness. Learn more at We are searching for the best talent for a Principal Clinical Scientist – Robotics & Digital Solutions, located in either Santa Clara, California or Cincinnati, Ohio. The role may require up... 
    Principal
    Local area
    Immediate start

    Johnson & Johnson

    Santa Clara, CA
    2 days ago
  • $206.3k - $388k

     ...graphs, multimodal embeddings, LLMs, and ranking systems to deeply understand intent and creative content. We're hiring a Principal Scientist to define how structured knowledge and foundation models work together at scale. You will architect next-generation graph-... 
    Principal
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    11 hours ago
  •  ...precision that drives great outcomes. Job Summary Note: This role requires US Citizenship. Your Career As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You aren't just managing servers... 
    Principal
    Full time
    Work at office
    Visa sponsorship
    Work visa
    Shift work

    Palo Alto Networks

    Santa Clara, CA
    4 hours ago
  • $206.3k - $388k

     ...ABOUT THE ROLE We're looking for a Principal Scientist (P60) to shape the data strategy behind Adobe Firefly's multimodal foundation models (image, video, audio). In this role, you'll work across research and engineering to improve how large-scale visual data is sourced... 
    Principal
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    1 day ago
  • $180k - $250k

     ...ThinkIQ in San Jose, California is looking for a Sr. Staff/Principal Engineer for its ALC architecture team. This role involves integrating optical, photonic, electronic, and mechanical components into optical datacom products. Candidates should have over 10 years of optical... 
    Principal

    ThinkIQ

    San Jose, CA
    4 days ago
  • $151.6k - $245.3k

     ...infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services...  ...in new technologies Work with developers, researchers, data scientists, and security experts Design, build and operate reliable,... 
    Principal

    Palo Alto Networks, Inc.

    Santa Clara, CA
    11 hours ago
  • $103.6k - $155.4k

     ...only part of history, they’re making history. Northrop Grumman Mission Systems is looking for you to join our team as a Principal Engineer, Reliability, based out of Sunnyvale, CA. The Principal Engineer, Reliability, is the key role to support the critical steam... 
    Principal
    Relocation package

    Northrop Grumman Corp. (JP)

    Sunnyvale, CA
    1 day ago
  •  ...robust, and cost‑competitive laser devices for their 800G and next‑gen 1.6T datacom transceiver designs. Our world‑class and highly reliable InP technology platform is one of the very few in the industry that has been proven, with more than one hundred million lasers in... 
    Principal
    Full time
    Work at office
    Remote work
    Long distance

    II-VI UK, Ltd.

    Fremont, CA
    2 days ago
  • $202k - $247k

     ...internal tooling, APIs, and frameworks which streamline our workflows and automate our infrastructure. About this role: As a Principal Site Reliability Engineer at FortiCNAPP, you will lead the design, implementation, and optimization of our highly scalable, resilient, and... 
    Principal
    Full time
    Worldwide

    Fortinet

    Santa Clara, CA
    2 days ago
  • $151.6k - $245.3k

     ...infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services...  ...new technologies Work with developers, researchers, data scientists, and security experts Design, build and operate reliable,... 
    Principal
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $151.6k - $245.3k

     ...infrastructure and is one of the largest GCP customers. As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience...  ...collaborating closely with developers, researchers, and data scientists. Develop tools and automation frameworks that champion... 
    Principal
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Santa Clara, CA
    4 hours ago
  • $260k - $275k

     ...SENIOR PRINCIPAL SOFTWARE ENGINEER Saviynt is an identity platform built to power and protect the world at work. With the rise of...  ...improvement in engineering processes, tooling, and operational reliability. Collaborate with internal teams to produce software design... 
    Principal

    Saviynt

    Milpitas, CA
    2 days ago
  • $162.7k - $263.18k

     ...Your Career As a Sr Principal Security Researcher, you will work at the forefront of AI-assisted vulnerability research, focusing on...  ...engineering, exploitability analysis, and security automation to build reliable workflows for vulnerability discovery, PoC generation, finding... 
    Principal

    Palo Alto Networks

    Santa Clara, CA
    4 days ago
  •  ...nEye Systems, Inc. in Santa Clara is seeking a Senior Materials Scientist to improve optical performance in semiconductor fabrication...  ...analyze material interfaces and lead initiatives to enhance device reliability and yield while collaborating closely with design and process... 

    nEye Systems, Inc.

    Santa Clara, CA
    11 hours ago
  •  ...Principal System Debug Engineer Austin, Texas, United States; US - Milpitas Position Overview We are seeking a senior technical...  ...CPU or x86 architectures, SoC design, memory subsystems, RAS (Reliability, Availability, and Serviceability), and power management.... 
    Principal
    Flexible hours

    Graphcore

    Milpitas, CA
    2 days ago
  •  ...other hardware systems Advise design engineering on selection, application and test of electronic components and systems Determine reliability requirements of components and systems to achieve company, customer and any governmental agency reliability objectives Make... 
    Principal

    Analog Group Inc

    San Jose, CA
    4 days ago
  •  ...A leading travel technology company is seeking a Principal Software Development Engineer to join their Runtime Team in San Jose. The...  ...runtime platform, mentoring engineers, and ensuring high system reliability. Candidates should have at least 8 years of experience with infrastructure... 
    Principal
    Full time

    Traveltechessentialist

    San Jose, CA
    11 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Reliability Scientist. Be the first to apply!