Senior HPC Cluster Engineer
$152k - $241.5kNVIDIA
Senior HPC Cluster Engineer page is loaded## Senior HPC Cluster Engineerlocations: US, CA, Santa Clara: US, TX, Austin: US, WA, Redmondtime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2014289NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA (Electronic Design Automation) and high-performance computing workloads used across multiple teams and projects. Join our engineering team and collaborate with researchers and infrastructure teams to ensure our GPU clusters are highly performant, scalable and reliable.**What you'll be doing:*** Develop and enhance our ecosystem around GPU-accelerated computing including developing scalable automation solutions.* Continuously improve infrastructure provisioning, management, observability and day to day operation through automation.* Provide technical leadership and strategic guidance for managing large-scale HPC systems, including the deployment of compute, networking, and storage.* Foster strong customer and multi-functional partnerships to ensure consistent cluster support and rapidly adapt to evolving user needs* Support our researchers to run their EDA workloads including performance analysis and optimizations.* Conduct root cause analysis and suggest corrective action. Proactively find and fix issues before they occur.* Build innovative tooling to accelerate researchers' velocity, debugging and software performance at scale.**What we need to see:*** Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience.* Minimum of 5 years of proven experience crafting and operating large scale compute infrastructure, including cluster configuration managements tools such as BCM or Ansible.* Experience with AI/HPC job schedulers and orchestrators, such as Slurm, LSF, PBS or K8s. Applied experience with AI/HPC workflows that use MPI and NCCL.* Proficient in using Linux including Rocky/Centos/RHEL and/or Ubuntu Linux distributions. A solid understanding of container technologies such Enroot and Docker.* Proficiency in Python and Bash* Experience analyzing and tuning performance for a variety of EDA workloads. Excellent problem-solving to analyze complex systems, identify bottlenecks, and implement scalable solutions.* Excellent communication and collaboration skills, with the ability to work effectively with various teams and individuals.* Passion for continual learning and staying ahead of new technologies and effective approaches in the HPC infrastructure fields.**Ways to stand out from the crowd:*** Background with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarking.* Experience supporting EDA workloads and tools.* Familiarity with High-Speed Networking pertaining to HPC including InfiniBand, RDMA and RoCE.* Understanding of fast, distributed storage systems such as Lustre and GPFS for AI/HPC workload.* Familiarity with metrics collection and visualization at scale with Prometheus, OpenSearch and Grafana.Our technology has no boundaries! NVIDIA is building the most groundbreaking and powerful compute platforms for the world to use. It’s because of our work that scientists, researchers and engineers can advance their ideas. At its core, our visual computing technology not only enables an amazing computing experience, but it is also energy efficient! We pioneered a supercharged form of computing loved by the most demanding computer users in the world - scientists, designers, artists, and gamers. It’s not just technology though! It is our people, some of the brightest in the world, and our diverse company culture make NVIDIA one of the most fun, innovative and dynamic places to work in the world! At the center of NVIDIA's culture are our core values like innovation, excellence and determination and team, that guide us to be the best we can be.Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and .Applications for this job will be accepted at least until March 15, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA
- ...a highly motivated and skilled GPU Cluster Network Performance Attainment Engineer to join our dynamic team. In this role... ...to provide strategic insights to senior management. You should be... ...expertise. Machine learning and/or HPC system design. ACADEMIC CREDENTIALS...Senior
$152k - $241.5k
A leading technology company in Austin is seeking a Senior HPC Cluster Engineer to design and deploy GPU Compute Clusters. The role involves providing technical leadership, collaborating with researchers, and supporting EDA workloads. Candidates should have a Bachelor's...Suggested$152k - $241.5k
Senior Site Reliability Engineer - HPC page is loaded## Senior Site Reliability Engineer - HPClocations: US, CA, Santa Clara: US, TX, Austin: US, NC,... ...critical services.* Experience supporting large‑scale HPC clusters using Slurm, LSF or Kubernetes clusters, including...Senior$200k
...What would it look like to engineer the physical backbone of one of the most advanced trading environments in the world? Join a... ...Utilise advanced telemetry to monitor the thermal profile of HPC clusters, proactively identifying cooling inefficiencies before they impact...SeniorPermanent employment- Advanced Micro Devices is looking for a Cluster Thermal Engineer in Austin, Texas. In this role, you will architect and deliver scalable thermal solutions for AI and HPC clusters. The ideal candidate is an early-career mechanical engineer with solid understanding of thermal...Suggested
- A leading technology firm in Austin is seeking a Senior HPC and LSF Operations Engineer to manage and optimize large-scale job scheduling systems. This role demands a Bachelor's degree in a relevant field and over 5 years of experience in Linux-based environments. Ideal...Senior
$152k - $241.5k
Senior HPC and LSF Operations Engineer page is loaded## Senior HPC and LSF Operations Engineerlocations: US, CA, Santa Clara: US, MA, Westford: US, TX, Austin: US, NC, Durhamtime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2014127As a member of the...Senior- ...Role: Cockroach DB Senior Engineer Location: Austin, TX & Sunnyvale, CA Contract Visa - USC, GC & OPT Job Description... ...Design, deploy, operate, and scale multi-region CockroachDB clusters in production environments Ensure high availability,...SeniorContract work
$150k - $190k
...are looking for a High Performance Computer Engineer. This is for a proposal and will be remote. The High Performance Computing (HPC) Engineer supports and optimizes HPC... ...advantage of HPC architectures including clusters, GPUs, and high-speed storage systems. The...Full timeRemote workFlexible hours$152k - $241.5k
A leading technology company is seeking a Senior Site Reliability Engineer in Austin, Texas. This role involves owning SRE solutions, supporting large-scale HPC clusters, and utilizing CI/CD techniques. Ideal candidates should have a B.S. in Computer Science, over 5 years...Senior- ...industry‑defining speeds. About the Role We are seeking an Electrical Engineer with a comprehensive understanding of end-to-end data center... ...Haves Experience with high‑density compute environments (AI/ML, HPC) and the unique electrical demands of GPU‑heavy workloads is a...SeniorLocal area
$136k - $218.5k
...on the world. We are now looking for a motivated ASIC Timing Engineer to join our dynamic and growing team. If you want to challenge... ...processes for NVIDIA's GPUs, CPUs, LPUs, and SoCs at block level, cluster level, and full chip level. Collaborate with Cross-Functional...SeniorFull time- Advanced Micro Devices in Austin, TX, is seeking a GPU Cluster Network Performance Attainment Engineer. This role focuses on optimizing GPU cluster performance with a strong emphasis on RDMA networks. The ideal candidate will have extensive experience in GPU architectures...Senior
$152k - $195k
...Sequoia Capital, GV and Riverwood Capital. About the Team As a Senior Site Reliability Engineer, you will be a key technical leader driving the design... ...skills. Preferred Qualifications Multi‑region or multi‑cluster Kubernetes experience. Chaos engineering or resilience...Senior$184k - $287.5k
We’re currently seeking a Senior Developer Technology Engineer, Artificial Intelligence! Would you enjoy researching parallel algorithms to accelerate... ...perform in-depth analysis and optimization of complex AI and HPC algorithms to ensure the best possible AI solutions on...SeniorWork experience placement- ...leading technology company in Austin, Texas is looking for an EngOps Engineer to maintain high-performance management solutions in datacenter... ...position requires at least 5 years of experience in deploying clusters and managing infrastructure, along with a degree in a related...Senior
- ...role requires a hands‑on, detail‑oriented engineer who thrives in a fast‑paced, high‑growth... ...are seeking a highly skilled and motivated Senior Systems Engineer to lead the design,... ...and preferably high‑performance computing (HPC) Lead architecture, deployment, and lifecycle...SeniorWork at office
$150k - $190k
...leading IT services provider is seeking a High Performance Computer Engineer to support and optimize high performance computing resources for... ...analysis. This remote position requires strong experience with HPC environments and collaboration with researchers to improve...Remote work- ...responders. And this is where you come in. We're seeking a Senior Site Reliability Engineer who can own our data tier at high availability while also... ...Year, You Will Have: Operated Zello's MySQL and MongoDB clusters to documented availability targets, with automated...SeniorPermanent employmentLocal areaFlexible hours
$152.4k - $254.5k
...future of humanity. About the Role We are looking for a Senior CPU Power Delivery Engineer to join our CPU subsystem team. You will own the end-to-... ...and analyze power delivery networks (PDN) for CPU clusters (including big/mid/little core configurations) across advanced...SeniorTemporary work$110.7k - $171.8k
..., including: Cloud infrastructure primitives Kubernetes clusters and cluster services Networking, ingress, and service discovery... ..., and internal control requirements. Collaborate with engineering teams across the organization to influence platform adoption,...SeniorWork experience placementWork at officeLocal area$113.58k - $192.9k
...We are seeking a senior technical contributor to help support, modernize, and scale our... ...work across Linux systems administration, HPC operations, Kubernetes-based services, automation... ..., support users running demanding engineering and AI/ML workloads, and create tooling,...SeniorImmediate startRemote workFree visaFlexible hours- ...to expand U.S. chip production capacity, the Taylor facility positions top engineering talent at the center of a rapidly growing technology cluster with global impact. Position Summary The Senior Linux Engineer is responsible for the architecture, operation, and...SeniorFull timeFlexible hours
$130.1k - $176k
...Hpc Operations Engineer Engineering IT provides the high-performance compute platforms that enable Arm's engineering teams to design, verify, and deliver world-class products. The team operates a mix of on-premises and cloud-based HPC environments, EDA enablement services...Work at officeLocal area- ...Texas to join its OCI team. The role involves designing and developing image automation software and cloud services, with a focus on GPU/HPC infrastructure solutions. Candidates should have a robust Linux OS background and experience in enterprise distributed systems....Senior
- A technology company in Austin is seeking a Senior Principal Network Development Engineer to lead the NIC qualification and New Product Introduction for advanced networking platforms. Candidates should possess 8-12+ years of experience in networking, strong knowledge of...Senior
- A leading engineering services company based in Austin, Texas is seeking an experienced engineer specialized in High-Speed Analog Mixed-Signal layouts. Candidates should have over 10 years of experience, along with proficiency in full-custom layout processes, power planning...Senior
- Booster is seeking a highly skilled Senior Electrical Engineer to design motor drives for humanoid robot actuators in Austin, Texas. You will lead the lifecycle of motor driver PCBA designs, from concept through testing to mass production. The ideal candidate will have...Senior
- SPACE EXPLORATION TECHNOLOGIES CORP is seeking a SR. ASIC DFT ENGINEER to develop next-generation ASICs for deployment in space and ground infrastructures. You will implement DFT architectures and collaborate with cross-functional teams to enhance the functionality of...Senior
- ...Austin, TX, is seeking a skilled Analog/Mixed-Signal/RF/Layout Engineer with over 8 years of experience in layout design. This contract... ...a relevant engineering degree and strong technical skills to excel in this mid-senior level role. #J-18808-Ljbffr KAnand CorporationSeniorContract work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior HPC Cluster Engineer. Be the first to apply!
- senior data management analyst Austin, TX
- senior app developer Austin, TX
- senior game producer Austin, TX
- senior retail sales associate Austin, TX
- senior manager quality engineering Austin, TX
- senior software test automation engineer Austin, TX
- senior quantitative risk analyst Austin, TX
- senior broker Austin, TX
- senior compensation manager Austin, TX
- senior sourcing engineer Austin, TX

