HPC Infrastructure Platform Engineer
Oak Ridge National Laboratory
Requisition Id 16521
Overview:
Major Duties/Responsibilities: Linux Administration:
- Deploy, configure and manage HPC-scale services in a Linux environment, primarily RedHat and Rocky
- Perform regular patches, updates and backups
- Monitor systems using tools like Nagios and Grafana
- Respond to and assist in troubleshooting issues
- Build and maintain foundational internal platforms and tools to enable the HPC Infrastructure team to reliably deploy, monitor and scale applications
- Design standardized and automated workflow patterns, build and maintain CI/CD pipelines
- Offer self-service, excellent documentation and assistance to HPC Infrastructure group members for efficient consumption of platform services
- Develop, maintain and review high quality code for internal tools using programming languages such as Python, Golang, or Rust
- Deploy, configure and support identity and access management services using LDAP and PingFederate
- Maintain and enable secure access for human users and automated workloads in Kubernetes
- Deploy and manage resources in the NCCS VMware environment
- Identify potential automation targets and lead efforts to automate processes
- Define policies and procedures for automation and configuration management for the team and organization as a whole
- Lead small Infrastructure projects through the project lifecycle
- Mentor and train junior staff, creating training documentation, holding knowledge sharing sessions, and fostering skill growth throughout the team
- Propose and implement improvements to existing Infrastructure systems as well as new systems, processes and procedures
- Bachelor's degree in computer science or closely related field and a minimum of 5 years of experience in Linux systems and Kubernetes platform administration, or a master's degree and a minimum of 4 years of experience in Linux systems and Kubernetes platform administration
- An equivalent combination of education and experience will be considered
- Excellent interpersonal/communication skills and the ability to work within a team
- Strong experience designing, building and maintaining Kubernetes platform tools
- Strong working knowledge of Linux system fundamentals and common network protocols
- Programming and scripting skills in common languages such as Python and bash
- Understanding of versioning and code review tools like GitHub and GitLab
- Experience implementing and supporting highly-available systems and services
- Experience with configuration management tools such as Puppet or Ansible
- Experience deploying and maintaining virtual environments using VMWare
- Experience deploying, maintaining and troubleshooting a variety of infrastructure services such as OpenLDAP, DNS, DHCP, etc.
- Ability to plan, prioritize and complete assigned projects with minimal supervision
- This position requires the ability to obtain and maintain a clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the HPC Infrastructure Platform Engineer in Oak Ridge, TN vacancy
- ...COMPANY OVERVIEW XCEL Engineering, Inc. is an award-winning small business that provides... ...a qualified applicant for a Kubernetes Platform Engineer. As a Platform Engineer, you will implement, and maintain the infrastructure underpinning our on-premises Kubernetes...SuggestedWork at officeLocal area
- ...jobs, grown economies, improved the resiliency of the world's infrastructure, increased access to energy, resources, and vital services,... ...We are looking for an experienced and talented DevSecOps engineer with a focus on MS SQL and Internet Information Server to join...SuggestedFull timeWork experience placementLocal areaRemote workRelocation
- ...Job Description Job Description Kubernetes Platform Systems Engineer The Department of Energy facility delivers scientific discoveries... ..., metrics and create dashboards ~ Working knowledge of Infrastructure-as-Code tooling such as Terraform, Helm, and Puppet ~...SuggestedRemote work
- ...be able to work 100% onsite in Oak Ridge, Tennessee We are looking for a Cloud Software Engineer who enjoys the intersection of application code and cloud infrastructure. Key Responsibilities Architect & Build: Design and develop scalable software solutions...Suggested
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to HPC Infrastructure Platform Engineer. Be the first to apply!
Related searches
- entry level infrastructure engineer
- infrastructure automation engineer
- senior IT infrastructure engineer
- security infrastructure engineer
- senior infrastructure engineer
- associate infrastructure engineer
- remote infrastructure engineer
- infrastructure engineering manager
- infrastructure engineer
- infrastructure engineer ii

