HPC Infrastructure Platform Engineer
Oak Ridge National Laboratory
Requisition Id 16521
Overview:
Major Duties/Responsibilities: Linux Administration:
- Deploy, configure and manage HPC-scale services in a Linux environment, primarily RedHat and Rocky
- Perform regular patches, updates and backups
- Monitor systems using tools like Nagios and Grafana
- Respond to and assist in troubleshooting issues
- Build and maintain foundational internal platforms and tools to enable the HPC Infrastructure team to reliably deploy, monitor and scale applications
- Design standardized and automated workflow patterns, build and maintain CI/CD pipelines
- Offer self-service, excellent documentation and assistance to HPC Infrastructure group members for efficient consumption of platform services
- Develop, maintain and review high quality code for internal tools using programming languages such as Python, Golang, or Rust
- Deploy, configure and support identity and access management services using LDAP and PingFederate
- Maintain and enable secure access for human users and automated workloads in Kubernetes
- Deploy and manage resources in the NCCS VMware environment
- Identify potential automation targets and lead efforts to automate processes
- Define policies and procedures for automation and configuration management for the team and organization as a whole
- Lead small Infrastructure projects through the project lifecycle
- Mentor and train junior staff, creating training documentation, holding knowledge sharing sessions, and fostering skill growth throughout the team
- Propose and implement improvements to existing Infrastructure systems as well as new systems, processes and procedures
- Bachelor's degree in computer science or closely related field and a minimum of 5 years of experience in Linux systems and Kubernetes platform administration, or a master's degree and a minimum of 4 years of experience in Linux systems and Kubernetes platform administration
- An equivalent combination of education and experience will be considered
- Excellent interpersonal/communication skills and the ability to work within a team
- Strong experience designing, building and maintaining Kubernetes platform tools
- Strong working knowledge of Linux system fundamentals and common network protocols
- Programming and scripting skills in common languages such as Python and bash
- Understanding of versioning and code review tools like GitHub and GitLab
- Experience implementing and supporting highly-available systems and services
- Experience with configuration management tools such as Puppet or Ansible
- Experience deploying and maintaining virtual environments using VMWare
- Experience deploying, maintaining and troubleshooting a variety of infrastructure services such as OpenLDAP, DNS, DHCP, etc.
- Ability to plan, prioritize and complete assigned projects with minimal supervision
- This position requires the ability to obtain and maintain a clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the HPC Infrastructure Platform Engineer in Oak Ridge, TN vacancy
- ...Kubernetes Platform Engineer XCEL Engineering, Inc. is an award-winning small business that provides trusted information technology,... ...a Platform Engineer, you will implement, and maintain the infrastructure underpinning our on-premises Kubernetes clusters, with a strong...SuggestedWork at officeLocal area
- ...DevOps Engineer Job Locations US-TN-Oak Ridge ID 2026-4556... ...transition to a modernized web application platform powered by containerized applications,... ...implementation efforts on containerized infrastructure. Embrace continuous integration and...SuggestedFull timeContract workWork at office
- ...Job Description Job Description Kubernetes Platform Systems Engineer The Department of Energy facility delivers scientific discoveries... ..., metrics and create dashboards ~ Working knowledge of Infrastructure-as-Code tooling such as Terraform, Helm, and Puppet ~...SuggestedRemote work
- ...Learning, Azure OpenAI, and Terraform Implement and maintain Infrastructure as Code (IaC) to support consistent and repeatable... ...project-specific requirements Continuously improve Azure engineering practices to support efficient delivery across a growing portfolio...Suggested
- ...be able to work 100% onsite in Oak Ridge, Tennessee We are looking for a Cloud Software Engineer who enjoys the intersection of application code and cloud infrastructure. Key Responsibilities Architect & Build: Design and develop scalable software solutions...Suggested
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to HPC Infrastructure Platform Engineer. Be the first to apply!
Related searches
- infrastructure engineer Oak Ridge, TN
- infrastructure developer Oak Ridge, TN
- data infrastructure engineer
- infrastructure engineering manager
- remote infrastructure engineer
- associate infrastructure engineer
- principal infrastructure engineer
- senior infrastructure engineer
- junior infrastructure engineer
- security infrastructure engineer


