System Infrastructure / Platform Engineer, HPC Technology Department
$156.86k - $191.72kLawrence Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is seeking a System Infrastructure / Platform Engineer to help build and manage HPC systems and Linux-based infrastructure. NERSC operates some of the world's largest supercomputers, supporting thousands of researchers tackling major scientific challenges. In this role, you will manage high-performance computing environments, including HPC systems, containers, virtual machines, and core infrastructure services. You'll work with cutting‑edge technologies such as CPU/GPU clusters, parallel storage, high‑speed networking, Slurm, and Kubernetes, balancing innovation with reliability, performance, and security at scale. Collaborating with engineers, researchers, vendors, and open‑source communities, you will help develop scalable solutions that advance scientific discovery and the future of HPC. If you have Linux experience, an interest in science, and enjoy fast‑paced collaborative environments, NERSC would love to hear from you. We're here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes! Why join Berkeley Lab? We invest in our employees by offering a total rewards package you can count on: Exceptional health and retirement benefits, including pension or 401K-style plans Opportunities to grow in your career - check out our Tuition Assistance Program A culture where you'll belong - we are invested in our teams! In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year. Parental bonding leave (for both mothers and fathers) Pet insurance What You Will Do if hired at a Level 3: Build and manage Linux systems and storage infrastructure Troubleshoot complex technical issues with team members Install, upgrade, and secure systems and services Develop and maintain scripts and automation tools Participate in a 24/7 on‑call rotation Lead small projects, upgrades, and service rollouts Collaborate with vendors to improve technologies and user experience Support reliable operations of NERSC's Perlmutter supercomputer and Spin Kubernetes platform Develop and integrate services across NERSC and DOE facilities, including the upcoming Doudna supercomputer Present technical work to the HPC community at conferences and industry events In Additional Responsibilities if hired at a Level 4: Solve complex technical problems with independent judgment Develop team strategies and project plans Provide technical leadership and mentorship Lead system improvements for performance, reliability, and security Evaluate emerging HPC technologies and capabilities Represent NERSC in HPC and DOE technical communities and advocacy groups What is Required to be hired at a Level 3: Typically, 8+ years of related experience with a Bachelor's degree; alternatively, 6+ years with a Master's degree; or equivalent career experience 4+ years of experience managing large‑scale Linux‑based system deployments in a high‑performance computing, cloud computing, or hyper‑scale environment Mastery of Linux concepts and operations (processes, networking, system logs, performance) Proficiency with bash and Python scripting Experience with some or all of our key technologies: containers (such as Docker or Kubernetes) virtualization (such as Proxmox or VMware) cloud‑based deployment (such as AWS, Azure or GCP) identity and access management database administration, tuning, and troubleshooting storage systems technologies (such as iSCSI and NAS appliances) parallel filesystems (such as Lustre, GPFS, or VAST) high‑speed networking/interconnect (such as InfiniBand, Slingshot, or RoCE) advanced performance analysis and debugging tools (such as strace, lsof, ebpf, or gdb) DevOps tools (such as Gitlab or Jira) and processes (such as issues, merge requests, and API/automation) Familiarity with automated provisioning systems (such as Chef, Foreman, or Terraform) Familiarity with configuration management systems (such as Ansible or Puppet) Working knowledge of Linux system engineering and security practices Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end‑user requirements or needs Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment Creative, positive, and collaborative work style Excellent oral and written communication skills Additional Requirements to be hired at a Level 4 Typically, 12+ years of related experience with a Bachelor's degree; alternatively, 8+ years with a Master's degree; or equivalent career experience Proven ability to lead troubleshooting and resolution of high‑impact incidents in complex, large‑scale environments Demonstrated leadership in cross‑team collaboration and mentoring Experience in software engineering, Linux systems programming, or complex scripting Experience managing one or more of the following: data center networking (TCP/IP, Ethernet, BGP, ECMP) batch workload managers (such as Slurm), including installation, configuration, routine operations, job lifecycle concepts, and troubleshooting common failure modes Cray/HPE HPC ecosystems (e.g., CSM/COS, Slingshot interconnect, and related components) Ability to lead and coordinate projects with traditional or Agile methodologies (such as Scrum or Kanban) Ability to analyze and resolve significant and unique issues requiring evaluation of multiple intangible factors Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results Additional information: Applications will be accepted until the job posting is removed. Appointment type: This is a full‑time, career appointment, exempt (monthly paid) from overtime pay. Salary range: Level 3: The expected salary for this position is $156,864 - $191,724, which fits into the full salary of $139,440 - $235,308 depending upon the candidate's skills, knowledge, and abilities. This includes education, certifications, and years of experience. Level 4: The expected salary for this position is $178,644 - $218,364, which fits into the full salary of $158,808 - $267,996 depending upon the candidate's skills, knowledge, and abilities. This includes education, certifications, and years of experience. Background check: This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment. Work modality: This position requires substantial on‑site presence, but is eligible for a flexible work mode, and hybrid schedules may be considered. Hybrid work is a combination of performing work on‑site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA and some telework. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Work schedules are dependent on business needs. In rare cases, full‑time telework or remote work modes may be considered. Multi‑level Posting: This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate. Export Control Access: This position will involve access to hardware, commodities, and technical information subject to export control regulations including, but not limited to, the Export Administration Regulations ("EAR") and/or International Traffic in Arms Regulations ("ITAR"). Accordingly, any hiring decision may depend in part on Berkeley Lab's ability to obtain or rely on federal government authorizations as required, if you are not a U.S. citizen, lawful permanent resident of the U.S. ("green card holder"), asylee, refugee, or other qualifying protected individual as defined by 8 U.S.C. 1324b(a)(3). Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law. Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer. #J-18808-Ljbffr
$156.86k - $191.72k
...System Infrastructure / Platform Engineer, HPC Technology Department The National Energy Research Scientific Computing Center (NERSC) is seeking a System Infrastructure / Platform Engineer to help build and manage HPC systems and Linux-based infrastructure. NERSC operates...SuggestedPermanent employmentFull timeRemote workFlexible hours- ...Berkeley National Laboratory is seeking a System Infrastructure / Platform Engineer in Berkeley, California to manage high-performance computing (HPC) systems and Linux-based... ...environments, working with cutting-edge technologies such as CPU/GPU clusters and Kubernetes...Suggested
- A healthcare technology company is seeking a Senior Software Engineer to design and maintain core platform infrastructure. This role involves significant responsibility in ensuring system scalability and resilience while leading platform initiatives. Candidates should...SuggestedRemote work
$216k - $270k
...As a Software Engineer on the Machine Learning Infrastructure team, you will build the "Operating System" for our large-scale... ...performance training platform that handles the... ...integrate emerging technologies in the CNCF and AI... ...the United States Department of Labor's Know Your...SuggestedFull time$216k - $270k
...As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and... ...and production systems, supporting both internal... ...data and full-stack technologies that power the world... ...the United States Department of Labor's Know Your...SuggestedFull time$162k - $216k
...harnessing emerging technologies to redefine transportation... ...to the data platform that will power the... ...creating impact for the engine of the American... ...Software Engineer - Infrastructure Department: Data Platform Location... ...Data Systems Expertise Demonstrates...Full timeWork at officeImmediate startRemote workMonday to Friday- ...people in financial difficulty. We build technology that makes it simple for residents... ...Our mission is to transform public systems so they work better for everyone,... ...startup. About the Role We want a Platform/Infrastructure engineer to help shape how Promise delivers...Permanent employmentWork at officeLocal areaFlexible hours
$110k - $167k
...unparalleled offering that combines technology, data, and expertise to... .... As a Senior Staff Platform Operations Engineer, you will lead the... ...engineering, DevOps, or infrastructure architecture with a... ...building shared platforms and systems. Deep expertise in Microsoft...For contractorsLocal area$147.93k - $291.61k
...era of autonomous transportation with technology that's powering commercial autonomous... ...Technical Execution: Lead the end-to-end systems engineering lifecycle for the Sensing, Perception,... ..., and flawless execution within the Platform team. Qualifications: - Experience...Full timeContract workWork at officeWork from homeFlexible hours$200k - $265k
A leading healthcare technology firm is seeking a Senior Software Engineer to design and maintain the infrastructure that empowers healthcare providers. This role involves owning... ...software engineering and experience with cloud platforms, containers, and databases. The position...- Principal Engineer, AI Platform & Infrastructure About the Role SPREEAI is building the future of AI-powered... ...world retail to deliver production systems that redefine how people shop online... ...through lifelike photorealistic try‑on technology and hyper‑personalized shopping...
$181.1k - $318.4k
...AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Infrastructure Apple is where individual imaginations gather together,... ...and implement new patterns and technologies to improve system performance, maintainability, and design. Optimize...Relocation- A leading technology company based in San Francisco is seeking a Staff Software Engineer for its Cloud Infrastructure team. This role involves designing resilient cloud infrastructure to support significant system load increases. The ideal candidate has over 8 years of...
- Build Technologies in San Francisco is seeking a hands-on AI Engineer to develop the infrastructure and systems critical for their agentic AI platform. The ideal candidate has strong systems engineering skills, is fluent in Python, and possesses backend systems experience...
- ...looking for a Senior Software Engineer to join our team in Berkeley,... ...designing and evolving distributed systems that power research and... ...environments. You will collaborate with infrastructure, security, and application teams to enhance platform capabilities. Successful...
- ...Type Full time Location Type On-site Department Engineering Think Different. Build the Future. Our... ...first‑class requirement. Evaluation Infrastructure: Build scalable evaluation harnesses... ...economics Build dashboards and alerting systems that give real‑time visibility into...Full timeWork at officeImmediate startRelocation packageNight shift
- A leading tech company is seeking an Infrastructure Engineer to build and scale its core platform powering AI systems. The role involves designing Kubernetes and Terraform-based infrastructures, defining standards for security and performance, and ensuring reliability....
- ...Platform/DevOps Engineer Unto Labs is a team of engineers pushing distributed systems to their physical limits, building the next generation of blockchain technology on commodity hardware. We focus on core system... ...Engineer to own the infrastructure our engineering team...Work at officeLocal areaFlexible hours
$170k - $215k
A pioneering micromanufacturing company is seeking a Staff Systems Engineer to develop innovative manufacturing hardware. The role involves collaboration with various engineering teams, prototype integration, and hands-on work with robotics. Ideal candidates will have...$193k - $235k
...About the role As a Senior DevOps Engineer within the Core Infrastructure team, you are a key person in managing and scaling the systems and services that power our streaming platform. Your expertise in cloud technologies and DevOps methodologies enables you to take...Flexible hours- Voleon is a technology company that applies state‑of‑... ...Cluster Site Reliability Engineer (SRE), you will help... ...a world‑class HPC platform for researchers to focus... ...both on‑prem and cloud infrastructure, and work to provide... ...while also engineering systemic improvements and...Local area
$172k - $215k
A leading technology firm in San Francisco is seeking a Data Engineer to design and implement high-throughput data processing... ...architecting a robust reporting platform that ensures reliability and scalability across distributed systems. Ideal candidates will have strong...- ...Job Description This is a Senior Oracle DevOps Engineer / Infrastructure Systems Engineer position that centers on architecting, implementing... ...If you're passionate about leveraging Oracle Cloud technologies to drive innovation and efficiency, come join our dynamic...Contract work
- ...specializing in physical AI is seeking an infrastructure software engineer to design, deploy, and scale distributed systems for their sensing and perception platform. This role involves working with... ..., and real-time streaming technologies. Join a dynamic team focused on...
- ...robust, scalable trading platform to serve high-traffic,... ...applications. Our infrastructure leverages state-of-the-art technologies to support real-time trading... ...of our platform and engineering culture. Job Summary... ...Grafana) for real-time system monitoring. Incident...Remote workFlexible hours
- ...San Francisco-based technology company dedicated to... ...Active Grid Response platform uses high-precision sensors... ..., environments, and engineers grows with it. We... ...the underlying cloud infrastructure and security posture,... ...experience with CI/CD systems, ideally GitHub Actions...
- ...The next step is to speak to Jack. Job Title: Senior Platform and Infrastructure Engineer Company Description: Context - Lux Capital and... ...for our Enterprise Agent OS. You will design multi-cloud systems for VPC, private cloud, and air-gapped environments, enabling...Live in
- ...businesses deserve financial infrastructure tailored to how they... ...is, at its core, a technology company and is on a... ...to build the best engineering team in the world. We... ...Senior Infrastructure/Platform Engineer focusedon... ...responsible for keeping our systems reliable, secure, and...Work at office
$190k - $225k
...Senior Platform Engineer San Francisco (Hybrid) About ZetaChain... ...ZetaChain is building the infrastructure for a more private and interoperable... ...is to create blockchain technology that puts users in control... ...and security of production systems that serve users globally—...Full timeLocal areaFlexible hours- ...Take full ownership of NeoSigma's platform infrastructure - lead architectural decisions and design... ...+ years of backend or infrastructure engineering - we look for slope over intercept... ...experience) ~ Genuine interest in AI systems and enthusiasm for building infra that...Visa sponsorshipFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to System Infrastructure / Platform Engineer, HPC Technology Department. Be the first to apply!
- healthcare systems engineer Berkeley, CA
- systems engineer Berkeley, CA
- operations support system engineer Berkeley, CA
- operating system engineer Berkeley, CA
- senior windows systems engineer Berkeley, CA
- advanced systems engineer Berkeley, CA
- system performance engineer Berkeley, CA
- computer system validation engineer Berkeley, CA
- infrastructure engineer Berkeley, CA
- infrastructure developer Berkeley, CA


