Senior Platform and EngOps Engineer - Cluster Operations
$176k - $276kDormont Manufacturing Company
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence. Join our team of innovative engineers who develop and maintain software facilitating GPU communication, driving groundbreaking solutions in High Performance Computing and Deep Learning. We’re looking for highly motivated EngOps and Platform Engineers to boost execution efficiency while managing and maintaining large GPU clusters interconnected via NVLink and InfiniBand. What you will be doing Develop automated tools to efficiently deploy, provision, and maintain extensive GPU clusters interconnected via NVLink and InfiniBand Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring seamless operations. Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance. Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions. Collaborate effectively with dynamic Engineering and Product Teams across multiple time zones to align cluster operations with evolving project requirements. What we need to see BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience. 8+ years of hands‑on experience in deploying and administrating clusters, servers, switches, and related infrastructure. Automation expert with hands on skills in Ansible, Python and Shell Scripting. Deep understanding of operating systems, computer networks, and high‑performance applications. Proven ability to work effectively with developers and test engineers across different teams and time zones. Proficient with Linux fundamentals. Ways to stand out from the crowd Familiarity with resource scheduling managers, preferably Slurm. Direct experience with industry standard alerting tools and emergency response practices. Hands‑on experience with GPU‑focused hardware and software, such as DGX systems and Compute Clusters. Proficiency in crafting and implementing a robust metrics collection and alerting infrastructure. Proficiency in designing large scale networking technologies and the associated challenges. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 176,000 USD - 276,000 USD for Level 4, and 208,000 USD - 333,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until March 20, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr Dormont Manufacturing Company
$83.9k - $155.7k
...and infectious diseases, and other applications. The Senior Systems Engineer will support Next Generation Sequencing (NGS) products and become an expert on the operation of the Roche Single Molecular Sequencing Platform. Responsibilities Serve as a hands‑on key...PlatformOperationsSeniorLocal areaRelocation package- Role Summary Oracle Health Platform Engineering builds core platform capabilities that enable Oracle... ..., highly available services. We operate with an AI-first engineering culture—engineers... ..., and operations. We are seeking a Senior Software Developer (IC3) to design, develop...PlatformOperationsSeniorVisa sponsorship
- Our Deloitte AI & Engineering team works to transform technology platforms, drive innovation, and help make a significant... ...professionals reimagining and reengineering operations and processes that are critical... ...innovation. Work you'll do As a Senior OpenAI FDE, you will work side...PlatformOperationsSenior
$66.52 - $88.14 per hour
Stanford Health Care seeks a Cloud Engineer in California to manage the Enterprise Information Management platform. The role requires expertise in Azure and Databricks,... ...experience, and a strong understanding of data operations. You will lead automation projects, ensure...PlatformOperationsSeniorHourly pay$133.95k - $165k
...warehouse logistics. As a Field Service Engineer, you will help to strengthen our world‑class... ...across Sales, Engineering, and Service Operations to drive technical solutions and resolve... ...with highly specialized and complex platform technology. Customer management and communication...PlatformOperationsSeniorVisa sponsorship$120k - $180k
..., a defense contractor, is seeking a Senior System Engineer with 12+ years of experience designing... ...targeting development, design, and operations solutions to a broad range of spectrum... ...direct the technical input for remote platform & payload controls Provide system...PlatformOperationsSeniorFor contractorsWork at officeLocal areaRemote work- Job Description: Senior Core Banking Engineer— XGEN & IBM i Data Specialist Function: Core Banking Engineering... ...extraction, and output generation platform used within IBM i / AS400‑based core... ...at the intersection of core banking operations, data engineering, and regulatory...PlatformOperationsSeniorFull timeBank staffRemote work
- ...1456 Job Summary F3EA is seeking a Senior CI/CD & Integration Engineer to support the Blue Water Instrumentation... ...Engineer will establish and operate the automated infrastructure that enables... ...tools, knowledge management platforms, and business process automation solutions...PlatformOperationsSeniorFull timeApprenticeshipWork at officeLocal area
$72 - $89 per hour
...motivated and technically accomplished Senior Process Engineer, MSAT to serve as a critical... ...establishment and maturation of GMP‑ready platform processes that form the foundation of... ...internal manufacturing capabilities, operating with a high degree of autonomy across...PlatformOperationsSeniorFull timeContract work$130k - $180k
Senior IT Systems and Automation Engineer About Moon An ambitious and independent stealth SaaS company incubated by... ...experience and deliver operational excellence for businesses across the world through a unified platform supercharged with proprietary AI agents...PlatformOperationsSeniorContract workFor contractorsWork at office- ...you. Open up opportunities with HPE. Senior Presales Systems Engineer Job Family Definition: Responsible... ...security, automation, and AI‑driven operations solutions. The preferred candidate... ...Access Assurance, Cisco ISE, or similar platforms Solid working knowledge of data...PlatformOperationsSeniorWork experience placementWork at officeRemote workWork from home
$72 - $86 per hour
GeneFab is seeking a Senior Manufacturing Engineer to lead the identification, implementation, and management... ...across our GMP manufacturing operations. This role will serve as the primary... ...systems such as batch record (EBR) platform to replace paper batch records across...PlatformOperationsSeniorFull timeContract workApprenticeship$170k - $200k
...missions in every domain. Umbra’s ecosystem operates through three business units: Remote... ...), and Mission Solutions (the platforms). Together, our teams develop capabilities... ...Umbra’s Radar Processing Group as a Senior Software Engineer. In this pivotal role, you will play...PlatformOperationsSeniorPermanent employmentWork at officeLocal areaRemote workWorldwideFlexible hours- Senior Instrumentation & Controls Engineer page is loaded## Senior Instrumentation & Controls Engineerlocations... ...This role is part of Blue Origin Operations, which is comprised of Integrated... ..., control systems, and automation platforms that allow a crew to safely monitor...PlatformOperationsSeniorFor contractorsWork at officeRelocation
$112.6k - $168.85k
...software, analytics, Site reliability engineers, Cloud Operations, Medical, Marketing, Data engineering... ...mobile applications on Android/iOS platforms a plus• Experience reviewing verification... ...status as a protected veteran.()The Senior Systems Engineer is a member of the...PlatformOperationsSeniorWork experience placementWork at office$116k - $170k
# Senior Software Engineer, Windows Sensor - CTIO (Hybrid)CrowdStrikeFull TimeseniorHybridCAPosted... ...the world’s most advanced AI-native platform. Our customers span all industries, and... ...second to provide deep visibility into operations on the endpoint, and performs rich...PlatformOperationsSeniorFull timeWork experience placementWork at officeLocal areaRemote work$172.36k - $258.55k
...automation strategies using RPA platforms (e.g., UiPath, Power... ...automation ROI analysis, establish engineering standards for fault-tolerant... ...design, and serve as the senior technical owner for the organization... ...models into clinical and operational workflows — including prompt...PlatformOperationsSeniorWork experience placement$157.5k - $254.35k
...’s Intelligent Agreement Management platform, companies can create, commit, and manage... ...self‑motivated, driven and creative Senior Site Reliability Engineer to join the Site Reliability team.... ...to eliminate toil and reduce operational risk, drive improvements in observability...PlatformOperationsSeniorContract workWork at officeLocal areaRemote work$119.3k - $145k
...Tandem Diabetes Care is hiring a Senior Software Test Engineer I to lead test projects and ensure quality... .... Oversee documentation of test operations and report results to software engineering... ...of bug fixes across the software platform. Develop and assist with...PlatformOperationsSeniorLocal areaRemote workFlexible hours2 days per week3 days per week$101.97k - $203.94k
...one community at a time. Job Purpose and Summary: As the Sr. Quality Engineer IT Wellpartner you will lead various teams supporting the ongoing operation and evolution of the Wellpartner platform including datacenter, network, infrastructure operations and engineering...PlatformOperationsSeniorHourly payFull timeTemporary workLocal area- ...Across North America, ASR Group® companies operate five sugar refineries, located in... ...sugarcane. OVERVIEW The Sr. Packaging Controls Engineer is responsible for maintaining... ...more of the following: Wonderware System Platform & InTouch, Allen Bradley PLC systems (PLC...PlatformOperationsSeniorFor contractorsWork experience placementLocal area
- ...company. We are currently looking for a Senior Unix System Engineer - REMOTE. In this pivotal role, you... ...directly to the successful operations of our partner. The role is essential... ...configure, and patch RHEL on various platforms Utilize Splunk for log analysis and...PlatformOperationsSeniorRemote jobTemporary workFlexible hours
$152k - $241.5k
...for a motivated Deep Learning engineer to bring advanced... ...characterization on multi‑GPU clusters. Design fault‑tolerant and elastic... ...showcase ultimate performance on NV platforms. Influence the roadmap of... ...architecture, HW‑SW interactions and operating systems principles (aka...PlatformSenior- ...We are seeking a Sr. Fill-Finish Process Engineer with strong experience in aseptic... ...along with a strong understanding of GMP operations and regulatory expectations. This position... ...systems. Experience with digital validation platforms such as KNEAT is a plus. Ensure all...PlatformOperationsSeniorWork at officeRemote workVisa sponsorshipWork visaFlexible hours
$168k - $264.5k
NVIDIA is looking for a Senior Reliability Engineer to join our LPU packaging team. What you’ll be doing... ...R&D, SI/PI, validation, QA, and operations. Experience with data center or cloud... ...hardware and understanding of: Rack and cluster‑level availability targets and...OperationsSenior$140k - $190k
...Software Engineer - Robotics Perception Sensors Seeking a skilled... ...directly influence how autonomous platforms make decisions in dynamic... ...with autonomy, UI, and operations teams to visualize and interpret... ...algorithms (segmentation, clustering, Kalman filters, etc.) Collaborative...PlatformOperationsWork at officeFlexible hours$132k - $207k
NVIDIA is seeking a highly skilled QA Engineer to join our Workstation and Virtualization... ...Validate NVIDIA products on customer‑specific platforms and configurations to ensure... ...architecture, supercomputers, and computer clusters, including caches, buses, memory controllers...PlatformSeniorRemote workFlexible hours$112.9k - $155.24k
Scope/Purpose of Position The Senior Systems Engineer is responsible for working... ...the systems in which they operate, and how Corning products... ...network architects and ASIC/platform engineers Perform system level... ...‑out), including GPU/XPU cluster designs and optical I/O...PlatformSeniorFull timeWork experience placement$112.6k - $168.85k
...-native architectures, and formal systems engineering processes. You will serve as the technical owner of NFRs across the platform lifecycle—from concept and requirements definition... ..., Site reliability engineers, Cloud Operations, Clinical, Medical, Marketing, Data...PlatformOperationsSeniorWork at officeRemote workFlexible hours$130k - $180k
Sr. Mechanical Engineer, HVAC & Controls (Facilities Infrastructure... ...Vandenberg, CA. We are seeking a senior mechanical engineer with deep... ...Develop control sequences of operation, program DDC/PLC logic, and... ...building automation platforms [e.g., Tridium Niagara, Siemens...PlatformOperationsSeniorPermanent employmentContract workTemporary workFor contractorsFor subcontractorLocal areaWeekend work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Platform and EngOps Engineer - Cluster Operations. Be the first to apply!
- senior platform engineer California, MO
- platform engineering manager California, MO
- platform developer California, MO
- data platform engineer California, MO
- platform engineer California, MO
- senior cloud service delivery manager California, MO
- senior business analyst contract California, MO
- senior product design engineer California, MO
- senior game producer California, MO
- senior software manager California, MO

