Senior Platform & EngOps Engineer: GPU Cluster Automation
NVIDIA
NVIDIA Corporation in Santa Clara, CA is seeking a Senior Platform and EngOps Engineer for Cluster Operations. You will develop automated tools to manage GPU clusters and implement modern DevOps practices to ensure operational efficiency. The ideal candidate has a strong background in computer engineering or a related field, with 8+ years of relevant experience and expertise in automation. Join us to contribute to groundbreaking advancements in AI and high-performance computing. #J-18808-Ljbffr NVIDIA
$176k - $276k
## Senior Platform and EngOps Engineer - Cluster OperationsApplylocations: US, CA, Santa Claratime type: Full timeposted... ...Computing and Visualization. The GPU, our invention, serves as the... ...What you will be doing:*** Develop automated tools to efficiently deploy, provision...Senior- NVIDIA Gruppe in Santa Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting-edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years...Senior
- NVIDIA Gruppe in Santa Clara is looking for a Senior HPC Architect to support the deployment of large-scale GPU compute clusters. You will provide engineering solutions for GPU computing products, ensuring technical relationships with teams and assisting in creative solutions...Senior
- black.ai is looking for a skilled platform engineer in Palo Alto to enhance our AWS infrastructure... ...engineering, DevOps practices, and GPU workloads. As a platform engineer, you... ...workflows, ensure the reliability of GPU clusters, and own CI/CD pipelines, facilitating...Senior
$160k - $322k
NVIDIA Gruppe in Santa Clara is seeking a Senior Technical Marketing Engineer focused on GPUs and scale-up architecture. The role involves showcasing NVIDIA's GPU architecture and server-level platforms, aiming to maximize performance for AI applications. The ideal candidate...Senior- ...NVIDIA Gruppe is seeking a Principal AI and ML Infra Software Engineer to join our Hardware Infrastructure team in Santa Clara, CA. In... ...efficiency by addressing infrastructure deficiencies for GPU Clusters, fostering innovations in AI/ML research. The ideal candidate...
- NVIDIA Gruppe is looking for an experienced GPU Deployment Engineer to tackle end-to-end AI deployment challenges on the NVIDIA RTX AI platform. The role involves analyzing GPU-accelerated applications, improving user experiences, and collaborating with teams to influence...Senior
$320k
A leading tech company is seeking a seasoned individual to spearhead DGX Cloud strategy, focusing on GPU lifecycle and operational health. The ideal candidate will have over 15 years in technical roles, with significant experience in cloud infrastructure and leadership....Senior$272k - $431.25k
NVIDIA Corporation seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California, to enhance the efficiency of AI/ML research on GPU Clusters. The role involves collaboration with various teams, monitoring infrastructure performance, and implementing improvements...- NVIDIA Corporation in Santa Clara is seeking a Senior Software Engineer to lead the optimization of large-scale AI systems. This role will involve... ...include leading the debugging process of multi-GPU environments and mentoring less experienced engineers. #J-1...Senior
$272k - $431.25k
...We are seeking a Principal AI and ML Infra Software Engineer, GPU Clusters at NVIDIA to join our Hardware Infrastructure team. As an Engineer... ...Python, Go, Bash, as well as familiarity with cloud computing platforms (e.g., AWS, GCP, Azure) in addition to experience with...- NVIDIA Gruppe in Santa Clara is seeking a Senior Software Engineer to lead the optimization of distributed training across large-scale GPU platforms. Candidates should have substantial experience in AI applications and technical leadership. This role involves profiling...Senior
- NVIDIA Gruppe is seeking a ML Platform Engineer to architect and scale high-performance ML infrastructure using modern Infrastructure-as-Code practices. You will collaborate with ML researchers to build robust platforms for advanced ML model development. The ideal candidate...Senior
$184k - $356.5k
...developer to design and implement systems for GPU based Client products. This role... ...in UEFI/BIOS development on X86 or ARM platforms, along with a strong background in C/C++... ...experience and a Bachelor’s Degree in Electrical Engineering or Computer Science. The compensation...Senior$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline... ...internal and external facing GPU cloud services run maximum... ...eliminating manual work through automation, performance tuning and... ...Observability & Telemetry collection platform with a focus on performance...Senior- ...Description We are hiring a Senior Platform Engineer to join the Autonomous... ...service workflows, APIs, and automation that make the right path the... ...lifecycle of production‑grade clusters. Strong proficiency in... ...hardware‑level performance (GPU passthrough) and clean cloud...SeniorWork experience placementLocal area
$160k - $200k
PlusAI, based in Silicon Valley, is seeking a Senior ML Infrastructure Engineer to design scalable architectures for machine learning models. This... ...role involves building robust data pipelines, managing GPU clusters, and collaborating with cross-functional teams....Senior- Pure Storage, Inc. is seeking a skilled software engineer to enhance its development platforms and automation services. This role focuses on improving scalability and reliability within the Santa Clara office environment. The ideal candidate will have over 5 years of experience...SeniorWork at office
- CoreWeave is hiring a Principal Engineer to lead the design and evolution of its AI infrastructure's cluster orchestration systems. The role demands expertise in Kubernetes and Slurm, working directly on technology that influences how efficiently GPUs are utilized. Your...
$184k - $287.5k
NVIDIA’s invention of the GPU in 1999 sparked the... ...and shared AI platforms that operate across desktop... .... These systems power engineering productivity, intelligent... ..., operational automation, enterprise search, knowledge... ...NVIDIA. We are seeking a Senior Staff Software...Senior$152k - $241.5k
...Visualization. Our invention—the GPU—functions as the visual... ...We are now looking for a ML Platform Engineer to help accelerate the next... ...will be on creating reliable, automated platforms that empower... ...large-scale, distributed GPU clusters. Apply SRE principles to diagnose...Senior- ...deeply technical, creative, and Senior AI Platform Engineer to build, support, and... ...infrastructure across cloud‑native clusters and on‑premises hardware.... ...and reduce toil. Develop automation and tooling to ensure... ...MLOps, model serving, and GPU‑accelerated environments. Experience...Senior
$245k - $295k
...Senior Manager, Infrastructure Platform Engineering Crusoe is on a mission to accelerate the abundance of energy... ...Kubernetes-based orchestration and automation to lower-level system and... ...with the operational challenges of GPU clusters, AI training, and inference workloads...SeniorTemporary workImmediate start- Crusoe is seeking a Senior Staff TPM in Sunnyvale, California, to lead the... ...model across hardware and engineering teams, ensuring successful customer cluster delivery. The ideal candidate will... ...management, with a deep understanding of GPU architecture and excellent...Senior
$168k - $258.75k
A leading AI technology company in Santa Clara is seeking a Senior Datacenter Technical Program Manager. In this role, you will drive the integration of cutting-edge AI systems into datacenters, coordinating with multiple teams and maintaining documentation. The ideal...SeniorRemote job$152k - $241.5k
NVIDIA Gruppe is seeking an experienced engineer to join the Scheduling team to design and enhance GPU compute clusters for AI/ML workloads. Candidates should have a Bachelor... ..., focusing on performance optimizations and automation solutions. Salary ranges from $152,000 to $2...Senior$152k - $287.5k
NVIDIA Corporation is seeking a motivated Performance Engineer to enhance the roadmap of communication libraries. In this role, you will conduct in-depth performance characterization on multi-GPU clusters and analyze the interaction of libraries with hardware and software...Senior$152k - $241.5k
NVIDIA Gruppe is seeking a talented individual to optimize and benchmark GenAI inference using the latest acceleration technologies. The role involves driving industry benchmark results and architecting distributed inference systems. Required qualifications include a relevant...Senior- NVIDIA Gruppe is seeking a Data Analyst to join their GPU-accelerated cluster team. In this role, you will analyze complex datasets to drive application and platform improvements while applying machine learning and deep learning techniques to derive actionable insights....Senior
$152k - $241.5k
NVIDIA Gruppe is seeking a motivated Performance Engineer to influence the roadmap of our communication libraries. The role involves... ...in-depth performance characterization on large multi-GPU and multi-node clusters and studying the interaction of our libraries with...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Platform & EngOps Engineer: GPU Cluster Automation. Be the first to apply!
- platform developer Santa Clara, CA
- senior platform engineer Santa Clara, CA
- platform engineering manager Santa Clara, CA
- platform engineer Santa Clara, CA
- data platform engineer Santa Clara, CA
- senior software test automation engineer Santa Clara, CA
- building automation specialist Santa Clara, CA
- automation specialist Santa Clara, CA
- network automation engineer Santa Clara, CA
- test automation engineer Santa Clara, CA
