Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

System Infrastructure / Platform Engineer, HPC Technology Department

$156.86k - $191.72k

Berkeley Lab

Overview The National Energy Research Scientific Computing Center (NERSC) is seeking a System Infrastructure / Platform Engineer to help build and manage HPC systems and Linux-based infrastructure. NERSC operates some of the world’s largest supercomputers, supporting thousands of researchers tackling major scientific challenges. In this role, you will manage high-performance computing environments, including HPC systems, containers, virtual machines, and core infrastructure services. You’ll work with cutting-edge technologies such as CPU/GPU clusters, parallel storage, high-speed networking, Slurm, and Kubernetes, balancing innovation with reliability, performance, and security at scale. You will collaborate with engineers, researchers, vendors, and open-source communities to develop scalable solutions that advance scientific discovery and the future of HPC. What You Will Do Build and manage Linux systems and storage infrastructure Troubleshoot complex technical issues with team members Install, upgrade, and secure systems and services Develop and maintain scripts and automation tools Participate in a 24/7 on-call rotation Lead small projects, upgrades, and service rollouts Collaborate with vendors to improve technologies and user experience Support reliable operations of NERSC’s Perlmutter supercomputer and Spin Kubernetes platform Develop and integrate services across NERSC and DOE facilities, including the upcoming Doudna supercomputer Present technical work to the HPC community at conferences and industry events Responsibilities In addition to Level 3 responsibilities, Level 4 adds: Solve complex technical problems with independent judgment; develop team strategies and project plans; provide technical leadership and mentorship; lead system improvements for performance, reliability, and security; evaluate emerging HPC technologies; represent NERSC in HPC and DOE technical communities and advocacy groups. What is Required to be hired at a Level 3 Typically, 8+ years of related experience with a Bachelor’s degree; alternatively, 6+ years with a Master’s degree; or equivalent career experience 4+ years of experience managing large-scale Linux-based system deployments in a high-performance computing, cloud computing, or hyper-scale environment Mastery of Linux concepts and operations (processes, networking, system logs, performance) Proficiency with bash and Python scripting Experience with some or all of our key technologies: containers (such as Docker or Kubernetes) virtualization (such as Proxmox or VMware) cloud-based deployment (such as AWS, Azure or GCP) identity and access management database administration, tuning, and troubleshooting storage systems technologies (such as iSCSI and NAS appliances) parallel filesystems (such as Lustre, GPFS, or VAST) high-speed networking/interconnect (such as InfiniBand, Slingshot, or RoCE) advanced performance analysis and debugging tools (such as strace, lsof, ebpf, or gdb) DevOps tools (such as Gitlab or Jira) and processes (such as issues, merge requests, and API/automation) Familiarity with automated provisioning systems (such as Chef, Foreman, or Terraform) Familiarity with configuration management systems (such as Ansible or Puppet) Working knowledge of Linux system engineering and security practices Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end user requirements or needs Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment Creative, positive, and collaborative work style Excellent oral and written communication skills Requirements Additional Requirements to be hired at a Level 4: Typically, 12+ years of related experience with a Bachelor’s degree; alternatively, 8+ years with a Master’s degree; or equivalent career experience Proven ability to lead troubleshooting and resolution of high-impact incidents in complex, large-scale environments Demonstrated leadership in cross-team collaboration and mentoring Experience in software engineering, Linux systems programming, or complex scripting Experience managing one or more of the following: data center networking (TCP/IP, Ethernet, BGP, ECMP) batch workload managers (such as Slurm), including installation, configuration, routine operations, job lifecycle concepts, and troubleshooting common failure modes Cray/HPE HPC ecosystems (e.g., CSM/COS, Slingshot interconnect, and related components) Ability to lead and coordinate projects with traditional or Agile methodologies (such as Scrum or Kanban) Ability to analyze and resolve significant and unique issues requiring evaluation of multiple intangible factors Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results Additional Information Applications will be accepted until the job posting is removed. Appointment type: This is a full-time, career appointment, exempt (monthly paid) from overtime pay. Salary range: Level 3: The expected salary for this position is $156,864 - $191,724, which fits into the full salary of $139,440 - $235,308 depending upon the candidate’s skills, knowledge, and abilities. This includes education, certifications, and years of experience. Level 4: The expected salary for this position is $178,644 - $218,364, which fits into the full salary of $158,808 - $267,996 depending upon the candidate’s skills, knowledge, and abilities. This includes education, certifications, and years of experience. Background check: This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment. Work modality: This position requires substantial on-site presence, but is eligible for a flexible work mode, and hybrid schedules may be considered. Hybrid work is a combination of performing work on-site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA and some telework. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Work schedules are dependent on business needs. In rare cases, full-time telework or remote work modes may be considered. Multi-level Posting: This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate. Export Control Access: This position will involve access to hardware, commodities, and technical information subject to export control regulations including, but not limited to, the Export Administration Regulations ("EAR") and/or International Traffic in Arms Regulations ("ITAR"). Accordingly, any hiring decision may depend in part on Berkeley Lab’s ability to obtain or rely on federal government authorizations as required, if you are not a U.S. citizen, lawful permanent resident of the U.S. ("green card holder"), asylee, refugee, or other qualifying protected individual as defined by 8 U.S.C. 1324b(a)(3). Want to learn more Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law. Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer. #J-18808-Ljbffr Berkeley Lab

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the System Infrastructure / Platform Engineer, HPC Technology Department in Berkeley, CA vacancy
  • $156.86k - $191.72k

     ...Scientific Computing Center (NERSC) is seeking a System Infrastructure / Platform Engineer to help build and manage HPC systems and Linux-based infrastructure. NERSC...  ...services. You'll work with cutting‑edge technologies such as CPU/GPU clusters, parallel storage, high... 
    Suggested
    Permanent employment
    Full time
    Remote work
    Flexible hours

    Lawrence Berkeley Lab

    Berkeley, CA
    5 days ago
  •  ...Berkeley National Laboratory is seeking a System Infrastructure / Platform Engineer in Berkeley, California to manage high-performance computing (HPC) systems and Linux-based...  ...environments, working with cutting-edge technologies such as CPU/GPU clusters and Kubernetes... 
    Suggested

    Lawrence Berkeley National Laboratory

    Berkeley, CA
    5 days ago
  • $156.86k - $191.72k

    Berkeley Lab is seeking a System Infrastructure / Platform Engineer to support high-performance computing systems. The role involves managing advanced...  ...will have extensive experience in Linux systems, cloud technology, and scripting, as well as a Bachelor’s or Master’s... 
    Suggested
    Full time

    Berkeley Lab

    Berkeley, CA
    5 days ago
  • Lawrence Berkeley National Laboratory is seeking a System Infrastructure / Platform Engineer to help build and manage high-performance computing systems at NERSC. This role involves managing Linux systems and collaborating with engineers and researchers to advance scientific... 
    Suggested

    Lawrence Berkeley Lab

    San Francisco, CA
    2 days ago
  • $162k - $216k

     ...Software Engineer - Infrastructure, Data Platform San Francisco, California, United States...  ...on harnessing emerging technologies to redefine...  ...Engineer - Infrastructure Department: Data Platform Location...  ...Qualifications Data Systems Expertise Demonstrates... 
    Suggested
    Full time
    Work at office
    Immediate start
    Remote work
    Monday to Friday

    Baton

    San Francisco, CA
    2 days ago
  • $212k - $318.4k

    A leading technology company in San Francisco is seeking a Software Engineer to join its Applied Machine Learning team. This...  ...and building a robust ML platform and infrastructure to support enterprise-level...  ...particularly with backend systems and cloud platforms like AWS... 

    Apple

    San Francisco, CA
    2 days ago
  •  ...building the AI Operating System for healthcare, the...  ...documented, and financed. Our platform spans the full care...  ...industries to adopt new technology, lagging in automation,...  ...change that. As Senior Engineering Manager, Platform Infrastructure Engineering (PIE), you'll... 
    Work at office
    Local area
    Immediate start

    Commure

    San Francisco, CA
    1 day ago
  • $200k - $250k

    A leading cloud provider in San Francisco is seeking a Software Engineer, Infrastructure Platform. You will develop internal tooling for data center operations, build monitoring systems, and design scalable infrastructure services. Candidates should have 3+ years of software... 

    Fluidstack

    San Francisco, CA
    4 days ago
  • $200k - $250k

     ...you farmed or you starved. Technology gave people more time for...  ...deploys frontier compute infrastructure fastest will decide...  ...is looking for a Software Engineer, Infrastructure Platform to build the foundational...  ...our next-generation CMDB system as the authoritative source... 
    Local area

    Fluidstack

    San Francisco, CA
    2 days ago
  • $200k - $265k

    A leading healthcare technology firm is seeking a Senior Software Engineer to design and maintain the infrastructure that empowers healthcare providers. This role involves owning...  ...software engineering and experience with cloud platforms, containers, and databases. The position... 

    Ambience Healthcare, Inc.

    San Francisco, CA
    2 days ago
  • $180k - $300k

    A technology firm in San Francisco is seeking strong engineers experienced in infrastructure design and maintenance for their adtech platform. The role involves working with a tight-knit team on performance-critical systems while enjoying a competitive compensation package... 

    Theory Ventures

    San Francisco, CA
    1 day ago
  • Senior Software Engineer — Platform & Infrastructure Senior IC role, reports to the CTO. Not a people‑manager...  ...and will run as standalone systems with their own architecture. Learnings...  ...build AI‑powered software for Planning departments, AEC firms and local government.... 
    Live in
    Local area

    Conflation Labs Inc.

    Berkeley, CA
    4 days ago
  • A leading technology company based in San Francisco is seeking a Staff Software Engineer for its Cloud Infrastructure team. This role involves designing resilient cloud infrastructure to support significant system load increases. The ideal candidate has over 8 years of... 

    Rippling

    San Francisco, CA
    4 days ago
  • Build Technologies in San Francisco is seeking a hands-on AI Engineer to develop the infrastructure and systems critical for their agentic AI platform. The ideal candidate has strong systems engineering skills, is fluent in Python, and possesses backend systems experience... 

    Build Technologies

    San Francisco, CA
    5 days ago
  • DutchTech is seeking a talented Software Engineer to develop next-generation software for illumination systems. This foundational role involves architecting connections between hardware, firmware, and cloud infrastructure. The ideal candidate has a degree in Software Engineering... 

    DutchTech

    Emeryville, CA
    3 days ago
  • A leading tech company is seeking an Infrastructure Engineer to build and scale its core platform powering AI systems. The role involves designing Kubernetes and Terraform-based infrastructures, defining standards for security and performance, and ensuring reliability.... 

    Brain Co.

    San Francisco, CA
    1 day ago
  • $180k - $280k

     ...reliable and general AI systems to power economically...  ...Intelligence (TAI): technology with the power to...  ...mid-2024, we’ve been engineering the foundation for what...  ...About the Role As a data infrastructure engineer, you will...  ...the internal data platform and tooling that powers... 
    Work at office
    Visa sponsorship
    Shift work

    TypeSafe AI

    San Francisco, CA
    2 days ago
  • An innovative energy technology firm located in San Francisco is seeking a Staff Software Engineer to design, build, and scale customer...  ...candidate will utilize systems programming expertise, ensuring...  ...impact the rapidly growing AI infrastructure sector. #J-18808-Ljbffr... 

    Crusoe Energy Systems LLC

    San Francisco, CA
    3 days ago
  • $181.1k - $318.4k

    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure San Francisco Bay Area, California, United States Machine Learning...  ...implement new patterns and technologies to improve system performance, maintainability, and design. Optimize... 
    Relocation

    Apple

    San Francisco, CA
    2 days ago
  •  ...California is looking for an experienced platform engineer to design, build, and operate significant components of their trading infrastructure. Candidates should have over 5 years in...  ...role offers the chance to influence systems that support a robust quantitative trading... 

    The Voleon Group

    Berkeley, CA
    2 days ago
  • $180k - $250k

     ...fast-growing, Series A technology company building the...  ..., delivering a cloud platform where AI applications...  ...the foundational infrastructure shaping the future of...  ...Firecracker, gVisor) and Linux systems to build high-...  ...Distributed Systems Engineers and work directly with... 

    Hamilton Barnes Associates Limited

    San Francisco, CA
    4 days ago
  • $147.93k - $291.61k

     ...era of autonomous transportation with technology that's powering commercial autonomous trucks...  ...: - Strong foundation in AV systems in motion planning, motion control and...  ...systems knowledge. - Knowledge of Systems Engineering and Verification and Validation (V&V) best... 
    Contract work
    Work at office
    Work from home
    Flexible hours

    Waabi

    San Francisco, CA
    17 days ago
  • $180k - $280k

     ...reliable and general AI systems to power economically...  ...Intelligence (TAI): technology with the power to...  ...mid-2024, we’ve been engineering the foundation for what...  ...We're looking for an Infrastructure Engineer to build and...  ...infrastructure for the platform that powers LLM... 
    Full time
    Work at office
    Visa sponsorship
    Shift work

    TypeSafe AI

    San Francisco, CA
    22 hours ago
  • $170k - $215k

    A pioneering micromanufacturing company is seeking a Staff Systems Engineer to develop innovative manufacturing hardware. The role involves collaboration with various engineering teams, prototype integration, and hands-on work with robotics. Ideal candidates will have... 

    Atomic Machines

    Emeryville, CA
    1 day ago
  •  ...LABS Unto Labs is a team of engineers pushing distributed systems to their physical limits, building...  ...generation of blockchain technology on commodity hardware. We...  ...THE ROLE We're looking for a Platform/DevOps Engineer to own the infrastructure our engineering team builds... 
    Work at office
    Local area
    Flexible hours

    Untolabs

    San Francisco, CA
    2 days ago
  • Voleon is a technology company that applies state‑of‑...  ...Cluster Site Reliability Engineer (SRE), you will help...  ...a world‑class HPC platform for researchers to focus...  ...both on‑prem and cloud infrastructure, and work to provide...  ...while also engineering systemic improvements and... 
    Local area

    The Voleon Group

    Berkeley, CA
    2 days ago
  •  ...specializing in physical AI is seeking an infrastructure software engineer to design, deploy, and scale distributed systems for their sensing and perception platform. This role involves working with...  ..., and real-time streaming technologies. Join a dynamic team focused on... 

    Specter Services LLC

    San Francisco, CA
    2 days ago
  • $182k - $250k

     ...quality care. Powered by technology, we are a three-...  ...re hiring a Senior Platform Reliability Engineer to help define and...  ...operate reliable systems at scale. What You...  ...foundation in cloud and infrastructure: You have hands-on...  ...(e.g., company and department offsites). The... 
    Full time
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours
    Day shift
    3 days per week

    Grow Therapy

    San Francisco, CA
    4 days ago
  • $125k - $195k

     ...to build this with today’s technology and a few simplifications....  ...of exceptional, hands-on engineers to make this happen. Mechanical...  ...role We are seeking an Infrastructure & Site Reliability...  ...Scale our observability platform: Build systems to ingest and display both... 
    Work at office
    Visa sponsorship
    Night shift

    Atomicsemi

    San Francisco, CA
    1 day ago
  • $300 per month

     ...Location Type On-site Department Cloud Engineering Crusoe's...  ...with sustainable technology at Crusoe. Here,...  ...cloud infrastructure. About This Role...  ...strategy, influence platform design, and ensure...  ...complex distributed systems, drives clarity...  ..., AI/ML, or HPC workloads Familiarity... 
    Full time
    Temporary work

    Epoch Biodesign

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to System Infrastructure / Platform Engineer, HPC Technology Department. Be the first to apply!