Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Software Engineer, Hardware Health

Slope

About the Team The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training. We take data center designs, turn them into real, working systems and build any software needed for running large-scale frontier model trainings. Our mission is to bring up, stabilize and keep these hyperscale supercomputers reliable and efficient during the training of the frontier models. About the Role On the Frontier Systems team, you’ll build critical infrastructure that keeps our supercomputers running reliably for cutting-edge AI research. Even a single hardware failure can derail a large-scale training run, so minimizing disruptions is core to the mission. Engineers here own their work end-to-end and are trusted to make a real impact. This role is for someone who goes deep - who thrives on root-causing system-level issues and building automation to catch and fix problems at scale. In this role, you will: Own and improve the system health checks that keep our hyperscale supercomputers stable during model training. Lead deep dives into hardware failures and system-level bugs to understand how things break at scale. Build automation that monitors and fixes issues across thousands of machines - so researchers can keep moving without interruption. You might thrive in this role if you have: 7+ years of industry experience in software engineering Proficiency with Python and shell scripting A high degree of comfort digging into noisy data with SQL, PromQL, and Pandas or any other tool necessary Experience developing reproducible analyses A balance of strengths in building and operationalizing Bonus if you have: Experience with low level details of hardware components, protocols, and associated Linux tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning) Experience with visualization of large data centers and networks. Expertise with network operations and tooling Expertise with power management and stabilization Equal Opportunity Employer We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records. We are committed to providing reasonable accommodations to applicants with disabilities. #J-18808-Ljbffr Slope

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Software Engineer, Hardware Health in San Francisco, CA vacancy
  • $250k

    About the Team The Hardware Health and Observability team owns the end-to-end health lifecycle...  ...to researchers and product teams. Engineers on this team own problems end-to-end, from...  ...7+ years of industry experience in software or infrastructure engineering. Strong... 
    Suggested

    OpenAI

    San Francisco, CA
    3 days ago
  • $250k

    Software Engineer, Hardware Health Frontiers Clusters - San Francisco About the Team The Hardware Health and Observability team owns the end-to-end health lifecycle of OpenAI’s global compute fleet. Our mission is to maximize healthy, usable compute across accelerator... 
    Suggested

    OpenAI

    San Francisco, CA
    1 day ago
  • $250k

    OpenAI is seeking a Software Engineer for Hardware Health in San Francisco. The role involves maintaining the health of compute clusters, building automated systems for monitoring hardware, and ensuring efficient operations across large-scale distributed environments. Candidates... 
    Suggested

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...deployment over unchecked growth. About the role As a software engineer on the Fleet Hardware team, you will be responsible for the reliability and...  ...and devise innovative solutions to maintain the health and efficiency of our supercomputing infrastructure.... 
    Suggested
    Full time

    OpenAI

    San Francisco, CA
    4 hours ago
  •  ...About Flow Flow Engineering is an AI-native requirements platform...  ...engineering organizations, enabling hardware teams to collaborate with AI...  ...Flow is seeking Full Stack Software Engineers to build AI-powered...  ...and meaningful equity. Health, dental, and vision coverage.... 
    Suggested
    Flexible hours

    The Engineering Co.

    San Francisco, CA
    1 day ago
  •  ...thinking technology company in San Francisco is seeking a Senior Software Engineer to develop the next generation of AI systems. The ideal...  ...working in a fully remote environment. Prior experience in hardware or electronics is not required, as the company values diverse... 
    Remote work

    Jobleads-US

    San Francisco, CA
    27 days ago
  • $150k - $215k

     ...Horowitz to Blackrock and Fidelity, and employs a team of 450 engineers and entrepreneurs. Astranis designs, builds, and...  ...ft. headquarters in Northern California, USA. SENIOR SOFTWARE ENGINEER - HARDWARE TEST We are seeking a highly skilled Senior Software Engineer... 
    Permanent employment
    Flexible hours
    Rotating shift

    Astranis

    San Francisco, CA
    3 days ago
  • $140k - $170k

     ...quantify fish weights, detect the health status, and generate optimal...  ...at three levels: on-site hardware for image capture, cloud pipelines...  ...looking for a Senior Backend Engineer to build and operate the...  ...gstreamer, FCR, FFmpeg ~ Strong software engineering skills; knowledge... 
    Immediate start
    Remote work
    Flexible hours

    Aquabyte

    San Francisco, CA
    3 days ago
  •  ...Performs as a key contributor to an engineering team that builds and supports...  ...activities on application software; this may often require...  ...and monitoring of production health. ¿ Produces complete, simple,...  ...impact assessment of product (hardware, software) upgrades ¿ Assists... 

    Procyon TS

    San Francisco, CA
    3 days ago
  •  ...Senior Product Engineer Lunar is a stealth technology company building a new type of software platform for health systems. We are on a mission to revolutionize healthcare with...  ...Bridge the gap between software and hardware: Architect a next-generation integration... 
    Remote work
    Flexible hours
    3 days per week

    Lunar GMBH

    San Francisco, CA
    3 days ago
  •  ...About Flow Flow Engineering is an AI-native requirements platform...  ...We're reimagining how complex hardware is built by pairing world-...  ...is hiring a senior frontend software engineer to own core user experiences...  ...and meaningful equity. Health, dental, and vision coverage.... 
    Flexible hours

    Flow Engineering

    San Francisco, CA
    2 days ago
  • $225k

    About the Team OpenAI's Hardware organization develops silicon and system-level solutions...  ...silicon while working closely with software and research partners to co-design hardware...  ...for AI. About the Role As a software engineer on the Scaling team, you'll help build and... 
    Work at office
    Local area
    Relocation package
    3 days per week

    OpenAI

    San Francisco, CA
    7 hours ago
  •  ...About Flow Flow Engineering is an AI-native requirements platform...  ...engineering organizations, enabling hardware teams to collaborate with AI...  ...role Flow is hiring a Software Engineer with an...  ...salary and meaningful equity. Health, dental, and vision coverage.... 
    Flexible hours

    Flow Engineering

    San Francisco, CA
    3 days ago
  • $180k - $250k

     ...You are a hands-on engineer who builds the software and processes that keep a large fleet of GPU servers...  ...of servers including provisioning, health monitoring, error detection, and recovery...  ..., dashboards, and alerting for hardware health across the fleet (GPU errors,... 
    Local area
    Remote work
    Relocation package

    Fal

    San Francisco, CA
    1 day ago
  • $200k - $240k

     ...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla...  ...We're looking for a Software Engineer, Build Infrastructure...  ...Sentry), and debugging of fleet health metrics like uptime and resource...  ...autonomous vehicle, or consumer hardware space. ~ Deep technical... 
    Local area
    Remote work

    Sauron

    San Francisco, CA
    7 hours ago
  • $160k - $190k

     ...are seeking a full-time Senior Robotics Software Engineer to enhance the performance and...  ...collaborate closely with teams across Hardware, Infrastructure, and Machine Learning to...  ...an equal opportunity employer offering Health, dental, vision, and commuter benefits... 
    Full time
    Immediate start

    King River Capital Group

    San Francisco, CA
    3 days ago
  • $175k - $195k

     ...Description We’re looking for a Senior Software Engineer to lead the development of systems that...  ...technology you build will empower Fleet Health operators to monitor device performance...  ...role at the intersection of software, hardware, and operations - perfect for engineers... 

    Gridware Technologies Inc.

    San Francisco, CA
    2 days ago
  • $185k - $325k

     ...platform is vital to our mission. That's why we're seeking a software engineer to help us build out our trust and safety capabilities....  ...withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all... 

    OpenAI

    San Francisco, CA
    2 days ago
  • $125k - $195k

    Atomic Semi is searching for a Robotics Software Engineer in San Francisco, California. The role requires building algorithm-rich software...  ...tools, demanding deep technical challenges in robotics and hardware integration. Successful candidates will have strong programming... 

    Atomic Semi

    San Francisco, CA
    1 day ago
  • $120k - $210k

     ...About Glass Health Glass Health was physician co-founded because we believe that technology should be fully leveraged to optimize the...  ...About the role: We’re looking for experienced, problem-solving engineers across the stack (Product, Full-Stack, Backend, Applied AI/LLM)... 
    Work at office
    Remote work
    Work from home
    Worldwide
    Flexible hours
    2 days per week

    GrabJobs

    San Francisco, CA
    3 days ago
  • $342k

     ...with employer contributions to Health Savings Accounts Pre-tax...  ...conditions. About the Team OpenAI’s Hardware organization develops silicon...  ...while working closely with software and research partners to co-...  ...AI. About the Role As an Engineer on our hardware optimization... 
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Centaur Labs

    San Francisco, CA
    4 days ago
  • $150k - $170k

     ...like by solving these issues through our software platform (SaaS). We combine cutting edge...  ...is committed to improving the lives and health of complex patients that have an...  ...looking for a Senior Full Stack Software Engineer who is excited about leveraging AI to drive... 
    Live in
    Remote work

    Arine

    San Francisco, CA
    7 hours ago
  • $108.7k - $181.1k

     ...accessible and affordable. Here, we focus on the health, happiness, and well-being of you and...  ...from you. Role Summary Ontada's Engineering team builds iKnowMed (iKM), the leading...  ...trial matching. We are hiring a Software Engineer III (P3) to design and build well... 
    Work experience placement
    Work at office
    Remote work
    2 days per week

    McKesson

    San Francisco, CA
    3 days ago
  •  ...Senior Full-Stack Software Engineer Location: United States - Hybrid/On-site in San Francisco or Remote Employment Type: Full-time Department...  ...: Engineering Reports to: Head of Engineering About Teal Health Teal Health is on a mission to provide women with the... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Teal Health

    San Francisco, CA
    7 hours ago
  • $140k - $170k

     ...quantify fish weights, detect the health status, and generate optimal...  ...at three levels: on-site hardware for image capture, cloud pipelines...  .... The role As a Platform Engineer, you will be responsible for...  ...and optimization Strong software engineering skills; knowledge... 
    Immediate start
    Remote work
    Flexible hours

    Aquabyte

    San Francisco, CA
    5 days ago
  • $120k - $180k

     .... We've proven that we can drive better health outcomes for children and families, and...  ...affordable, high-quality healthcare requires engineering ingenuity, leadership committed to...  ...Blueberry will NEVER ask you to download software/apps or request sensitive personal information... 
    Temporary work
    Work at office
    Remote work
    Visa sponsorship
    Work visa

    GrabJobs

    San Francisco, CA
    2 days ago
  •  ...Software Engineer, Full Stack In Brief We’re a rapidly growing startup on a mission to make healthcare proactive by empowering physicians,...  ...positive impacts to frontline users within leading enterprise health systems. Who We Are Bayesian Health’s mission is to improve patient... 
    Local area

    GrabJobs

    San Francisco, CA
    1 day ago
  •  ...manual work to carry out critical internal processes, yet most health systems don't have enough resources to properly automate...  ...California or New York, New York. About the Role As a Software Engineer working on Product, you will design, build, improve user facing... 
    Work at office
    3 days per week

    Luminai, Inc

    San Francisco, CA
    3 days ago
  •  ...Khosla Ventures, World Innovation Lab, Gradient Ventures, Cone Health Ventures, and others—all backing our mission to empower...  ...have you on our team! Why Join Us: We’re seeking several Software Engineers with full stack (any mix of front end, backend, and database)... 
    Full time
    Remote work
    Flexible hours

    Rad AI

    San Francisco, CA
    3 days ago
  •  ...Department: Engineering Level: Senior (IC) Reports To: Senior Engineering Manager Version Date...  ...us in our mission to transform skin health and enhance lives—one patient at a time....  ...of the Role: The mission of the Senior Software Engineer is to architect and deliver high... 
    Contract work
    Immediate start
    Remote work
    Flexible hours

    GrabJobs

    San Francisco, CA
    7 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Software Engineer, Hardware Health. Be the first to apply!