Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Platform and EngOps Engineer - Cluster Operations

$176k - $276k

Dormont Manufacturing Co

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence. Join our team of innovative engineers who develop and maintain software facilitating GPU communication, driving groundbreaking solutions in High Performance Computing and Deep Learning. We’re looking for highly motivated EngOps and Platform Engineers to boost execution efficiency while managing and maintaining large GPU clusters interconnected via NVLink and InfiniBand. What you will be doing Develop automated tools to efficiently deploy, provision, and maintain extensive GPU clusters interconnected via NVLink and InfiniBand Implement modern DevOps tools to automate software updates, perform maintenance tasks, and monitor cluster availability, ensuring seamless operations. Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance. Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions. Collaborate effectively with dynamic Engineering and Product Teams across multiple time zones to align cluster operations with evolving project requirements. What we need to see BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience. 8+ years of hands‑on experience in deploying and administrating clusters, servers, switches, and related infrastructure. Automation expert with hands on skills in Ansible, Python and Shell Scripting. Deep understanding of operating systems, computer networks, and high‑performance applications. Proven ability to work effectively with developers and test engineers across different teams and time zones. Proficient with Linux fundamentals. Ways to stand out from the crowd Familiarity with resource scheduling managers, preferably Slurm. Direct experience with industry standard alerting tools and emergency response practices. Hands‑on experience with GPU‑focused hardware and software, such as DGX systems and Compute Clusters. Proficiency in crafting and implementing a robust metrics collection and alerting infrastructure. Proficiency in designing large scale networking technologies and the associated challenges. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 176,000 USD - 276,000 USD for Level 4, and 208,000 USD - 333,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until March 20, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr Dormont Manufacturing Co

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Platform and EngOps Engineer - Cluster Operations in California, MO vacancy
  • $83.9k - $155.7k

     ...and infectious diseases, and other applications. The Senior Systems Engineer will support Next Generation Sequencing (NGS) products and become an expert on the operation of the Roche Single Molecular Sequencing Platform. Responsibilities Serve as a hands‑on key... 
    Platform
    Operations
    Senior
    Local area
    Relocation package

    F. Hoffmann-La Roche AG

    California, MO
    1 day ago
  • $160k - $200k

     ...under the Sydney Harbour Bridge to now operating globally, we’re spread across the...  ...and we’re growing. We’re looking for a Senior Sales Engineer to join our world‑class team and help...  ...bot management and fraudulent account platform to life for customers across the globe... 
    Platform
    Operations
    Senior
    Remote work
    Flexible hours

    Kasada

    California, MO
    2 days ago
  • Role Summary Oracle Health Platform Engineering builds core platform capabilities that enable Oracle...  ..., highly available services. We operate with an AI-first engineering culture—engineers...  ..., and operations. We are seeking a Senior Software Developer (IC3) to design, develop... 
    Platform
    Operations
    Senior
    Visa sponsorship

    Ll Oefentherapie

    California, MO
    1 day ago
  • Our Deloitte AI & Engineering team works to transform technology platforms, drive innovation, and help make a significant...  ...professionals reimagining and reengineering operations and processes that are critical...  ...innovation. Work you'll do As a Senior OpenAI FDE, you will work side... 
    Platform
    Operations
    Senior

    PowerToFly

    California, MO
    5 days ago
  •  ...computation. About The Role As a Kernel Engineer on our team, you will develop high-performance...  ..., optimizing, and scaling deep learning operations to fully leverage our custom, massively...  ...Cerebras: Build a breakthrough AI platform beyond the constraints of the GPU. Publish... 
    Platform
    Operations
    Senior

    Cerebras

    California, MO
    5 days ago
  • $66.52 - $88.14 per hour

    Stanford Health Care seeks a Cloud Engineer in California to manage the Enterprise Information Management platform. The role requires expertise in Azure and Databricks,...  ...experience, and a strong understanding of data operations. You will lead automation projects, ensure... 
    Platform
    Operations
    Senior
    Hourly pay

    Stanford Health Care

    California, MO
    1 day ago
  •  ...live entertainment company is seeking a Cloud Engineer to manage and administer SQL Server...  ...role involves designing and running database platforms, troubleshooting database issues, and supporting large scale operations. candidates should have strong expertise in... 
    Platform
    Operations
    Senior

    Live Nation International

    California, MO
    3 days ago
  • $133.95k - $165k

     ...warehouse logistics. As a Field Service Engineer, you will help to strengthen our world‑class...  ...across Sales, Engineering, and Service Operations to drive technical solutions and resolve...  ...with highly specialized and complex platform technology. Customer management and communication... 
    Platform
    Operations
    Senior
    Visa sponsorship

    Boston Dynamics, Inc.

    California, MO
    3 days ago
  • Job Description: Senior Core Banking Engineer— XGEN & IBM i Data Specialist Function: Core BankingEngineering...  ...extraction, and output generation platform used within IBM i / AS400- based...  ...at the intersection of core banking operations, data engineering, and regulatory... 
    Platform
    Operations
    Senior
    Remote job
    Full time
    Bank staff

    ESP Engineered

    California, MO
    4 days ago
  •  ...1456 Job Summary F3EA is seeking a Senior CI/CD & Integration Engineer to support the Blue Water Instrumentation...  ...Engineer will establish and operate the automated infrastructure that enables...  ...tools, knowledge management platforms, and business process automation solutions... 
    Platform
    Operations
    Senior
    Full time
    Apprenticeship
    Work at office
    Local area

    F3EA Inc

    California, MO
    3 days ago
  • $130k - $180k

    Senior IT Systems and Automation Engineer About Moon An ambitious and independent stealth SaaS company incubated by...  ...experience and deliver operational excellence for businesses across the world through a unified platform supercharged with proprietary AI agents... 
    Platform
    Operations
    Senior
    Contract work
    For contractors
    Work at office

    Moon

    California, MO
    4 days ago
  • $72 - $89 per hour

     ...motivated and technically accomplished Senior Process Engineer, MSAT to serve as a critical...  ...establishment and maturation of GMP‑ready platform processes that form the foundation of...  ...internal manufacturing capabilities, operating with a high degree of autonomy across... 
    Platform
    Operations
    Senior
    Full time
    Contract work

    GeneFab

    California, MO
    5 days ago
  •  ...you. Open up opportunities with HPE. Senior Presales Systems Engineer Job Family Definition: Responsible...  ...security, automation, and AI‑driven operations solutions. The preferred candidate...  ...Access Assurance, Cisco ISE, or similar platforms Solid working knowledge of data... 
    Platform
    Operations
    Senior
    Work experience placement
    Work at office
    Remote work
    Work from home

    HPE Aruba Networking

    California, MO
    5 days ago
  • $72 - $86 per hour

    GeneFab is seeking a Senior Manufacturing Engineer to lead the identification, implementation, and management...  ...across our GMP manufacturing operations. This role will serve as the primary...  ...systems such as batch record (EBR) platform to replace paper batch records across... 
    Platform
    Operations
    Senior
    Full time
    Contract work
    Apprenticeship

    GeneFab

    California, MO
    4 days ago
  • At Commure, we're building the AI Operating System for healthcare, the foundation that defines...  ..., documented, and financed. Our platform spans the full care journey: Ambient AI...  ...anything touches production. Scale a rule engine that runs hundreds of configurable conditions... 
    Platform
    Operations
    Senior
    Work at office
    Immediate start

    COMMURE Incorporated

    California, MO
    4 days ago
  • $170k - $200k

     ...missions in every domain. Umbra’s ecosystem operates through three business units: Remote...  ...), and Mission Solutions (the platforms). Together, our teams develop capabilities...  ...Umbra’s Radar Processing Group as a Senior Software Engineer. In this pivotal role, you will play... 
    Platform
    Operations
    Senior
    Permanent employment
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    Umbra

    California, MO
    1 day ago
  • $208k - $333.5k

    Systems Engineering is an engineering discipline focused on building, automating, and operating the platforms and tooling that deliver large-scale production systems with high efficiency, reliability, and velocity. It combines software and systems engineering practices... 
    Platform
    Operations
    Senior

    NVIDIA Gruppe

    California, MO
    2 days ago
  • Senior Instrumentation & Controls Engineer page is loaded## Senior Instrumentation & Controls Engineerlocations...  ...This role is part of Blue Origin Operations, which is comprised of Integrated...  ..., control systems, and automation platforms that allow a crew to safely monitor... 
    Platform
    Operations
    Senior
    For contractors
    Work at office
    Relocation

    Blue Origin LLC

    California, MO
    5 days ago
  • $112.6k - $168.85k

     ...software, analytics, Site reliability engineers, Cloud Operations, Medical, Marketing, Data engineering...  ...mobile applications on Android/iOS platforms a plus• Experience reviewing verification...  ...status as a protected veteran.()The Senior Systems Engineer is a member of the... 
    Platform
    Operations
    Senior
    Work experience placement
    Work at office

    Insulet Corporation

    California, MO
    5 days ago
  • $116k - $170k

    # Senior Software Engineer, Windows Sensor - CTIO (Hybrid)CrowdStrikeFull TimeseniorHybridCAPosted...  ...the world’s most advanced AI-native platform. Our customers span all industries, and...  ...second to provide deep visibility into operations on the endpoint, and performs rich... 
    Platform
    Operations
    Senior
    Full time
    Work experience placement
    Work at office
    Local area
    Remote work

    TryApplyNow

    California, MO
    3 days ago
  • $157.5k - $254.35k

     ...’s Intelligent Agreement Management platform, companies can create, commit, and manage...  ...self‑motivated, driven and creative Senior Site Reliability Engineer to join the Site Reliability team....  ...to eliminate toil and reduce operational risk, drive improvements in observability... 
    Platform
    Operations
    Senior
    Contract work
    Work at office
    Local area
    Remote work

    DocuSign, Inc.

    California, MO
    5 days ago
  • $119.3k - $145k

     ...Tandem Diabetes Care is hiring a Senior Software Test Engineer I to lead test projects and ensure quality...  .... Oversee documentation of test operations and report results to software engineering...  ...of bug fixes across the software platform. Develop and assist with... 
    Platform
    Operations
    Senior
    Local area
    Remote work
    Flexible hours
    2 days per week
    3 days per week

    Tandem Diabetes

    California, MO
    1 day ago
  • $250.5k - $335.9k

    Sr Principal Site Reliability Engineer - Media Engineering Req ID: 10144162 Location: San...  ...Engineering team to build a high‑availability platform that delivers streaming, advertising,...  ...incidents. Partner with Infrastructure, Operations, Product, and Development teams to... 
    Platform
    Operations
    Senior

    5014 Disney Entertainment & Sports LLC

    California, MO
    5 days ago
  •  ...Across North America, ASR Group® companies operate five sugar refineries, located in...  ...sugarcane. OVERVIEW The Sr. Packaging Controls Engineer is responsible for maintaining...  ...more of the following: Wonderware System Platform & InTouch, Allen Bradley PLC systems (PLC... 
    Platform
    Operations
    Senior
    For contractors
    Work experience placement
    Local area

    ASR Group

    California, MO
    4 days ago
  • $101.97k - $203.94k

     ...one community at a time. Job Purpose and Summary: As the Sr. Quality Engineer IT Wellpartner you will lead various teams supporting the ongoing operation and evolution of the Wellpartner platform including datacenter, network, infrastructure operations and engineering... 
    Platform
    Operations
    Senior
    Hourly pay
    Full time
    Temporary work
    Local area

    Hispanic Alliance for Career Enhancement

    California, MO
    4 days ago
  •  ...company. We are currently looking for a Senior Unix System Engineer - REMOTE. In this pivotal role, you...  ...directly to the successful operations of our partner. The role is essential...  ...configure, and patch RHEL on various platforms Utilize Splunk for log analysis and... 
    Platform
    Operations
    Senior
    Remote job
    Temporary work
    Flexible hours

    Jobgether

    California, MO
    5 days ago
  •  ...We are seeking a Sr. Fill-Finish Process Engineer with strong experience in aseptic...  ...along with a strong understanding of GMP operations and regulatory expectations. This position...  ...systems. Experience with digital validation platforms such as KNEAT is a plus. Ensure all... 
    Platform
    Operations
    Senior
    Work at office
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    Initial Therapeutics, Inc.

    California, MO
    5 days ago
  • $140k - $190k

     ...Software Engineer - Robotics Perception Sensors Seeking a skilled...  ...directly influence how autonomous platforms make decisions in dynamic...  ...with autonomy, UI, and operations teams to visualize and interpret...  ...algorithms (segmentation, clustering, Kalman filters, etc.) Collaborative... 
    Platform
    Operations
    Work at office
    Flexible hours

    Kismet Search

    California, MO
    2 days ago
  • $132k - $207k

    NVIDIA is seeking a highly skilled QA Engineer to join our Workstation and Virtualization...  ...Validate NVIDIA products on customer‑specific platforms and configurations to ensure...  ...architecture, supercomputers, and computer clusters, including caches, buses, memory controllers... 
    Platform
    Senior
    Remote work
    Flexible hours

    Dormont Manufacturing Co

    California, MO
    2 days ago
  • $112.9k - $155.24k

    Scope/Purpose of Position The Senior Systems Engineer is responsible for working...  ...the systems in which they operate, and how Corning products...  ...network architects and ASIC/platform engineers Perform system level...  ...‑out), including GPU/XPU cluster designs and optical I/O... 
    Platform
    Senior
    Full time
    Work experience placement

    Corning Inc.

    California, MO
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Platform and EngOps Engineer - Cluster Operations. Be the first to apply!