Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, Kernel Reliability

Dormont Manufacturing Company

Cerebras Systems builds the world’s largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras’ current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. About The Role We’re looking for a deeply technical, hands‑on software engineer to join our on‑field Kernel Reliability team. You’ll help tackle a critical challenge: improving the reliability of our advanced compute clusters and the underlying inference, training, and internal production services. In this role, you’ll work close to the code and design solutions that will scale with our rapidly growing system production and software service offerings. If you have strong fundamentals in systems, debugging, and failure analysis—and enjoy building tools and solving hard reliability problems—we want to hear from you. New college graduates are welcome. Responsibilities Contribute to the technical roadmap and execution for kernel‑centric reliability of our internal and customer‑facing systems. Partner with System and Cluster Operations teams to reduce system and service downtime after failure through tooling, analysis, and hands‑on debugging support. Work with the Debug Team to enhance debug tools with the goal of speeding up failure analysis. Collaborate with software teams to improve the software stack—including kernels—to improve on‑field debugging and failure analysis. Work with ASIC and hardware architecture teams to co‑design next‑generation architectures with reliability and ease of debug in mind. Participate in incident response, root‑cause analysis, and post‑mortems; drive follow‑ups that measurably improve reliability over time. Skills & Qualifications We recognize great engineers come from different backgrounds. If you’re excited about the role, we encourage you to apply even if you don’t meet every qualification. Required (or demonstrated through projects/internships/coursework): Strong programming skills in C/C++ and Python. Solid foundations in operating systems, computer architecture, and systems programming fundamentals. Ability to debug complex issues using logs, traces, and standard debugging workflows; interest in root‑cause analysis. Nice to have: Exposure to parallel and distributed programming (message passing, multicore, GPU, embedded, etc.). Experience building or using debug/diagnostic tools (debuggers, core dump handling, tracing, sanitizers, profilers, etc.). Familiarity with debugging distributed and parallel applications (deadlocks, livelocks, race conditions, etc.). Knowledge of computer architecture concepts (instruction pipelining, multithreading, networking, memory systems, etc.). Operations & Monitoring: familiarity with monitoring, incident response, and post‑mortem culture. Why Join Cerebras People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras: Build a breakthrough AI platform beyond the constraints of the GPU. Publish and open source their cutting‑edge AI research. Work on one of the fastest AI supercomputers in the world. Enjoy job stability with startup vitality. Our simple, non‑corporate work culture that respects individual beliefs. Read our blog: Five Reasons to Join Cerebras in 2026. Apply today and become part of the forefront of groundbreaking advancements in AI! Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them. This website or its third‑party tools process personal data. For more details, click here to review our CCPA disclosure notice. #J-18808-Ljbffr

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Software Engineer, Kernel Reliability in Sunnyvale, CA vacancy
  •  ...Cerebras Systems is seeking a deeply technical software engineer for its Kernel Reliability team in Sunnyvale, California. This role involves enhancing the reliability of advanced compute clusters. The ideal candidate will have strong programming skills in C/C++ and Python... 
    Suggested

    Dormont Manufacturing Company

    Sunnyvale, CA
    23 hours ago
  • $167k - $246k

     ...Linux Kernel Software Engineer - Systems Engineering Santa Clara, California We're in an unbelievably exciting area of tech and are fundamentally...  ...Linux kernel to push the boundaries of performance and reliability. You'll play a vital role in shaping the future of our... 
    Suggested
    Work at office
    Flexible hours

    Pure Storage

    Santa Clara, CA
    3 days ago
  • $170k - $216k

     ...U.S. states. The Planner/Perception Reliability team builds out architectures, tools, and...  ...reliability and is accountable for onboard software health while ensuring high development...  ...you will report to a Staff Software Engineer / Tech Lead Manager. You will: Architect... 
    Suggested
    Full time
    Immediate start
    Remote work

    Waymo

    Mountain View, CA
    3 days ago
  •  ...will be part of a core team that ensures safe, reliable, and scalable releases of the Autonomous Vehicle (AV) software stack through automation, data-driven...  ...stability of AV releases by unifying software engineering, reliability analysis, and release automation... 
    Suggested
    Local area
    Work from home

    General Motors

    Sunnyvale, CA
    4 days ago
  • $152k - $241.5k

     ...applications and industries. Within our software stack, CUTLASS stands out as a popular open...  ...about developing and optimizing math kernels to extract the highest performance out of...  ...degree in Computer Science, Computer Engineering, or related field (or equivalent experience... 
    Suggested

    NVIDIA

    Santa Clara, CA
    6 days ago
  • $175k - $215k

     ...Software Reliability Engineer, Waymo Fleet Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    5 days ago
  • $147.4k - $272.1k

     ...Systems & Kernel Software Engineer, SEAR The SPEAR Systems & Kernel team in Apple's Security Engineering & Architecture organization is hiring...  ...: performance, memory footprint, compatibility, and reliability. Testing thoroughly, debugging carefully, and partnering... 
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  • $184k - $287.5k

     ...We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack! We build innovative AI systems software to...  ...develop libraries, code generators, and GPU kernel technologies for NVIDIA's hardware architecture... 

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $207k - $300k

    Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should... 

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $174k - $252k

    Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California Fair... 
    Full time

    Google Inc.

    Sunnyvale, CA
    1 day ago
  •  ...transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is...  ...regional offices. Remote possible. The Role: Software Engineer, Staff - SIMD Kernels As a member of the SIMD Kernels team, you will help productize... 
    Work experience placement
    Remote work

    D-Matrix

    Santa Clara, CA
    1 day ago
  • $2,000 per month

     ...investors and staffed by leading engineers, Etched is redefining the infrastructure...  ...Design, develop, and maintain kernel-mode drivers ensuring high reliability, informative debug, and optimal...  ...environments. Collaborate with software and hardware teams to diagnose and... 
    Work at office
    Relocation package

    ETCHED LLC

    San Jose, CA
    5 days ago
  • $120k - $200k

     ...Senior Software Engineer - Kernel & Device Drivers Austin, Texas, United States; San Jose, California, United States The era of pervasive...  ...skills, solid hardware understanding, and a focus on reliability, performance, and code quality. Responsibilities... 
    Local area

    SambaNova Systems

    San Jose, CA
    3 days ago
  • $184k - $287.5k

     ...best work. Come join the team and see how you can make a lasting impact on the world. We are looking for a Senior Linux Kernel Software Engineer to join the Linux networking drivers R&D team. The work environment is versatile, informative, dynamic and challenging as... 

    NVIDIA

    Santa Clara, CA
    2 days ago
  •  ...computing, artificial intelligence, and software-defined networking to provide our clients...  ...prestigious awards, such as Best Engineering Team, Best Company for Diversity, Compensation...  ...Who You’ll Work With Arista’s Linux Kernel team is responsible for developing and maintaining... 
    Work experience placement

    Arista Networks, Inc.

    Santa Clara, CA
    10 days ago
  • $213k - $263k

     ...Senior Software Engineer, Linux Kernel Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World'... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    1 day ago
  • $147k - $211k

    Software Engineer III, Linux Kernel Networking corporate_fare Google place Sunnyvale, CA, USA Bachelor's degree or equivalent practical experience. 2 years of experience in Linux Kernel development using the programming languages C/C++. 2 years of experience with developing... 
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $175k - $263k

     ...THE ROLE Join the Systems Software team to architect and deliver...  ...cutting edge of performance and reliability. WHAT YOU’LL DO Architect...  ...Linux networking internals (kernel concepts, drivers, or packet...  ...capabilities. Availability‑Focused Engineering: Understanding of non‑... 
    Work at office
    Flexible hours

    Pure Storage

    Santa Clara, CA
    1 day ago
  • $90k - $215k

     ...Senior Software Engineer- Observability and Reliability Platform Engineering (REMOTE) Senior Software Engineer- Observability and Reliability Platform Engineering (REMOTE) 1 week ago Be among the first 25 applicants At GEICO, we offer a rewarding career where your ambitions... 
    Hourly pay
    Full time
    Work experience placement
    Local area
    Remote work
    Flexible hours

    GEICO

    San Jose, CA
    7 days ago
  • $165k - $242k

     ...Systems Engineer, Kernel Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA CoreWeave...  ...that improves the performance and reliability of our stack. This position is...  ...excited to work across a diverse hardware/software ecosystem including CPUs, GPUs, DPUs,... 
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    3 days ago
  • $148k - $226.2k

    ## Sr. Infotainment Platform Software EngineerApplyremote type: Onsitelocations...  ....* Responsible for Android Kernel and framework implementation,...  ...Work side-by-side with DevQA engineers on test plan development and...  ...and suspend/resume, ensuring reliability and efficiency across... 
    Full time
    Flexible hours

    General Motors

    Mountain View, CA
    1 day ago
  • $147.4k - $272.1k

     ...Software Development Engineer In Test - Kernel Quality Engineering, Core Os The Darwin Kernel organization plays a vital role in Apple's success. We...  ...frameworks, and building automation that enables fast, reliable feedback throughout the development lifecycle. This role... 
    Worldwide
    Relocation

    Apple

    Cupertino, CA
    2 days ago
  • $2,000 per month

     ...IaC), configuration management, and purpose-built software that eliminates toil and improves reliability. We're also a team that grows people as well as...  ...love to hear from you. What You Will Be Doing: Engineering software to automate large-scale systems -... 
    Local area
    Flexible hours

    Elastic

    Mountain View, CA
    2 days ago
  • $262k - $365k

    Senior Staff Software Engineer, Kernel Security and Virtualization Google Sunnyvale, CA, USA Apply Qualifications Bachelor's degree or equivalent...  ...AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud... 
    Full time
    Worldwide

    Google Inc.

    Sunnyvale, CA
    23 hours ago
  •  ...Principal Software Engineer - Kernels At d-Matrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture... 
    Work experience placement
    3 days per week

    D-Matrix

    Santa Clara, CA
    1 day ago
  •  ...d-Matrix inc. is seeking a Principal Software Engineer specializing in kernels at our headquarters in Santa Clara, CA. In this role, you will be responsible for developing and maintaining software kernels for next-generation AI hardware, ensuring optimized performance.... 
    3 days per week

    d-Matrix inc.

    Santa Clara, CA
    1 day ago
  • A leading technology company in Sunnyvale, California is seeking a Reliability Engineer for their AI & Data Platforms team. The role involves developing and operating big data platforms and ensuring platform reliability through incident management and performance optimization... 

    Apple Inc.

    Sunnyvale, CA
    2 days ago
  •  ...A leading tech company in Cupertino is seeking an experienced Sr. Site Reliability Engineer to ensure the reliability, scalability, and observability of its cloud platform. This role involves building and maintaining mission-critical systems that serve millions of users... 

    Apple

    Cupertino, CA
    1 day ago
  •  ...Reliable Robotics Corporation in Mountain View is seeking a Build Software Engineer to drive the optimization of build and continuous integration design for C++ and Rust software products. You'll work in a collaborative environment and help shape the infrastructure necessary... 

    Reliable Robotics Corporation

    Mountain View, CA
    23 hours ago
  •  ...Qualcomm is seeking a Sr. Staff / Principal-level Software Engineer in Santa Clara, California to provide technical leadership in ARM server...  ...software. The candidate should have strong experience in Linux kernel development and programming skills in C, C++, Java, or Python... 

    Nutanix

    Santa Clara, CA
    23 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Kernel Reliability. Be the first to apply!