Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

System Engineer, GPU Fleet

$200k - $300k

Fluidstack

System Engineer, GPU Fleet

As a System Engineer, GPU Fleet, you will manage, operate, and optimize hyperscale GPU compute infrastructure supporting AI/ML training and inference workloads. Ensure high availability, performance, and reliability of GPU server fleet through automation, monitoring, troubleshooting, and collaboration with hardware engineering, platform teams, and datacenter operations.

Focus
  • Operate and maintain large-scale GPU server fleet (H100, B200, GB200) supporting AI/ML workloads; monitor system health, performance, and utilization to maximize uptime and ensure SLA compliance
  • Perform hands-on troubleshooting and root cause analysis of complex hardware, firmware, OS, and application issues across GPU clusters; coordinate with vendors and hardware teams to resolve systemic failures
  • Develop and maintain automation scripts for provisioning, configuration management, monitoring, and remediation at scale.
  • Build and improve tooling for GPU health checks, performance diagnostics, driver validation, and automated recovery
  • Execute server provisioning, configuration, firmware updates, and OS installation using automation frameworks; manage lifecycle operations including deployment, maintenance, and decommissioning
  • Participate in 24x7 on-call rotation; respond to production incidents and coordinate resolution with cross-functional teams including datacenter operations, network engineering, and application teams
  • Lead post-incident reviews, document root causes, and drive continuous improvement initiatives focused on automation, reliability, monitoring, and operational efficiency
Basic Qualifications
  • Bachelor's degree in Computer Science, Engineering, or related technical field (or equivalent practical experience)
  • 3+ years (System Engineer) or 5+ years (Senior System Engineer) in Linux system administration, datacenter operations, or infrastructure engineering
  • Strong Linux/Unix fundamentals including system administration, shell scripting (Bash, Python), troubleshooting, and performance tuning
  • Experience with server hardware architecture, troubleshooting techniques, and understanding of compute, memory, storage, and networking components
  • Experience in automation and configuration management tools (Ansible, Puppet, Chef, Terraform).
  • Strong analytical and problem-solving skills with ability to diagnose complex technical issues under pressure
  • Excellent communication and collaboration skills; ability to work effectively with cross-functional teams
Preferred Qualifications
  • Experience managing large-scale GPU infrastructure (NVIDIA H100, A100, B200, GB200) in production environments supporting AI/ML workloads
  • Deep knowledge of GPU architecture, CUDA toolkit, GPU drivers, monitoring tools (nvidia-smi, DCGM)
  • Experience with HPC cluster management, job schedulers (Slurm, PBS, LSF), and container orchestration (Kubernetes, Docker)
  • Proficiency in out-of-band management protocols (IPMI, Redfish, BMC) and firmware management for server hardware
  • Experience with high-performance networking (InfiniBand, RoCE, RDMA) and network troubleshooting in GPU cluster environments
  • Familiarity with datacenter operations including rack installations, cabling, power management, and thermal considerations
Salary & Benefits
  • Competitive total compensation package (salary + equity).
  • Retirement or pension plan, in line with local norms.
  • Health, dental, and vision insurance.
  • Generous PTO policy, in line with local norms.

The base salary range for this position is $200,000 - $300,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.

We are committed to pay equity and transparency.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Vacancy posted 7 hours ago
Similar jobs that could be interesting for youBased on the System Engineer, GPU Fleet in New York, NY vacancy
  • $165k - $242k

     ...Systems Kernel Engineer Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA CoreWeave is...  ...(networking, storage, virtualization, GPU/DPU enablement). Stack-Wide Support...  ...observability. Work closely with HPC and Fleet teams to ensure kernel readiness for... 
    Fleet
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    New York, NY
    14 hours ago
  • $153k - $242k

     ...Senior Systems Engineer, OS Automation CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a...  ...automate reproducible OS image build pipelines for our massive fleet of GPU-accelerated servers. ~ Kernel Distribution: Collaborate... 
    Fleet
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Local area
    Remote work
    Flexible hours

    CoreWeave

    New York, NY
    3 days ago
  • $65 - $85 per hour

     ...Overview Role Summary: Chemistry Systems Engineer contributing and being a key technical member of the Chemistry and System Engineering team...  ...and delivery of major and complex assignments supporting the fleet operating BWRs and the new BWRX-300 (small modular reactor... 
    Fleet
    Hourly pay
    Contract work
    Remote work

    Morson Talent (Canada & USA)

    New York, NY
    2 days ago
  •  ...substantially faster and cheaper by owning and operating our own global fleet of bare-metal machines rather than renting generic cloud VMs....  ...exceptional team. Blacksmith was founded by a team with deep systems and scaling experience, including building search/ads... 
    Fleet
    Second job

    Blacksmith

    New York, NY
    10 hours ago
  • $65k - $75k

     ...Robotics is looking for a Robot Operations Engineer in the United States to oversee the...  ...reliability and performance of the Tally robot fleet. The role includes remote monitoring,...  ...in a technical role, knowledge of Linux systems, and proficiency in scripting. The position... 
    Fleet
    Remote work

    Simbe Robotics Inc

    New York, NY
    2 days ago
  • $150k - $200k

     ...excellence. The Role: As a foundational engineer on our Corporate Technology team, you...  ...architect and steward of the internal systems that empower our employees and secure our...  ...Active Directory, Entra ID) and manage our fleet of company assets through modern MDM... 
    Fleet
    Work at office

    Summit Securities Group

    New York, NY
    7 hours ago
  • $88.4k - $154.7k

     ...Our NYC Transit and Rail team is seeking a Transit & Rail Systems Engineer / Vehicle Integration Lead to support major rolling stock, CBTC...  ...signaling or train control support, testing and commissioning, fleet planning, or operational analysis. ~ Ability to coordinate... 
    Fleet
    Local area
    Worldwide
    Flexible hours

    Parsons Company

    New York, NY
    5 days ago
  • $62.44k - $88.12k

     ...This opportunity resides with Warfare Systems (WS) , a business group within HII's Mission...  ..., network architecture, reverse engineering, software and hardware development uniquely...  ...and synthetic training environments to fleet sustainment, environmental remediation and... 
    Fleet
    Full time
    Work experience placement
    Work at office
    Local area
    Worldwide

    HII Mission Technologies Division

    New York, NY
    14 hours ago
  • $130k - $150k

     ...We are seeking a hands-on Linux Systems Engineer to build, maintain, and scale our on-premise infrastructure. This role is focused on bare...  ...patching, upgrades, and capacity planning across a growing server fleet Participate in incident response, root cause analysis, and... 
    Fleet
    Casual work

    Trexquant Investment

    New York, NY
    4 days ago
  •  ...Principal Systems Engineer (HPC, Python/Go) New York, NY (Hybrid, 3 days in office) Highly competitive compensation package Join...  ...design and execution of high-impact projects for a distributed fleet of 10,000+ compute servers. You will drive decisions on hardware... 
    Fleet
    Work at office

    Elliot Partnership

    New York, NY
    2 days ago
  • $77.5k - $120k

     ...This opportunity resides with Warfare Systems (WS), a business group within HII's Mission...  ..., network architecture, reverse engineering, software and hardware development uniquely...  ...and synthetic training environments to fleet sustainment, environmental remediation and... 
    Fleet
    Full time
    Work experience placement
    Interim role
    Work at office
    Local area
    Worldwide

    Huntington Ingalls Industries

    New York, NY
    3 days ago
  •  ...intelligence and environmental monitoring. We are building a fleet of high-altitude balloon systems (HAPS-inspired platforms) equipped with imaging,...  ..., and climate resilience. This is a high-stakes engineering initiative operating at the intersection of aerospace... 
    Fleet
    Remote work
    Flexible hours

    World Disaster Center

    New York, NY
    4 days ago
  • $150k - $250k

    Hudson River Trading (HRT) is seeking a Windows Systems Engineer to join our Enterprise Engineering team. In this role, you will help scale...  ...code, and infrastructure-as-code solutions to seamlessly manage fleets of top-of-the-line user endpoints, mission critical servers,... 
    Fleet
    Full time
    Work at office
    Local area
    Immediate start
    Remote work

    Hudson River Trading

    New York, NY
    5 days ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational...  ...and internal service mesh), and observability and alerting systems. The Fleet Management team provides the core runtime environment that... 
    Fleet
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    New York, NY
    2 days ago
  • $185k - $227k

     ...purpose and we are hiring the world’s best engineers, scientists, designers, product managers,...  ...governance Configure and manage AWS Systems Manager (SSM) including Session Manager,...  ...Manager, and Automation for centralized fleet operations Implement centralized logging... 
    Fleet
    Remote work

    JUUL Labs

    New York, NY
    2 days ago
  • $200k - $300k

     ...Hudson River Trading (HRT) is seeking a Software Engineer focused on GPU reliability to join our Systems Development team. The Systems Development team builds...  ...improve observability, reliability, and efficiency of the fleet. You'll work closely with other engineering teams to... 
    Fleet
    Work at office
    Local area
    Immediate start

    Hudson River Trading

    New York, NY
    2 days ago
  • $130k - $250k

     ...accuracy in complex, noisy environments - at fleet scale. RADAR is one of the best-funded...  ...it. You'll work closely with hardware engineers, firmware developers, backend and cloud...  ...Required : You have expertise with RFID systems — UHF Gen2, reader configuration, antenna... 
    Fleet
    Remote work
    Flexible hours

    RADAR

    New York, NY
    20 days ago
  •  ...Senior Client Platform Systems Engineer Beast Industries is a multifaceted media and entertainment company founded by Jimmy Donaldson,...  ...Windows environments. This role exists to ensure our device fleet operates as resilient, compliant, and scalable infrastructure... 
    Fleet
    Relocation package
    Flexible hours
    3 days per week

    MrBeast

    New York, NY
    14 hours ago
  • $109k - $160k

     ...GPU Infrastructure Software Engineer Sunnyvale, CA CoreWeave is The Essential Cloud for AI™. Built for pioneers...  ..., reliability, and scalability of systems that power CoreWeave's...  ...performance testing. You'll partner with fleet, product, and hardware teams to evolve... 
    Fleet
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    New York, NY
    1 day ago
  •  ...-edge technology company is looking for exceptional generalist engineers who thrive with autonomy. This fully remote role allows you to...  ...optimizing CUDA kernels to designing distributed orchestration systems. Ideal candidates will have a Bachelor's degree and a strong track... 
    Remote work

    Inferact

    New York, NY
    2 days ago
  •  ...Conviction. Join us and help build the platform engineers turn to to ship AI products. At Baseten, we are building the global operating system for distributed, heterogeneous AI hardware...  ...for foundational engineers to lead our GPU Networking efforts, making RDMA a first-... 
    Flexible hours

    Baseten

    New York, NY
    2 days ago
  •  ...System Support Engineer As the leader in transit technology, Clever Devices' vision is to make meaningful contributions to worldwide mobility...  ...logfiles and other inputs to proactively look for trends in fleet data. Also review reported issues for potentially larger issues... 
    Fleet
    Work at office
    Worldwide

    Clever Devices

    Passaic, NJ
    20 days ago
  • $78.55k - $112.22k

     ...Mid This opportunity resides with Warfare Systems (WS), a business group within HII’s...  ...cybersecurity, network architecture, reverse engineering, software and hardware development...  ...and synthetic training environments to fleet sustainment, environmental remediation and... 
    Fleet
    Full time
    Work at office
    Local area
    Worldwide

    Huntington Ingalls Industries

    New York, NY
    4 days ago
  •  ...Systems Engineer 5 Location: New York - (Hybrid Role 2 days a week onsite) Duration: 2 Years Job Description: Summary: Senior Low...  ...them. These will include High Performance Cooling Systems, FPGA, GPU, Microwaves. This candidate is expected to work directly with... 
    Rotating shift
    2 days per week

    Samprasoft

    New York, NY
    2 days ago
  • $150k - $300k

     ...Hudson River Trading (HRT) is looking for Systems Engineers to join our growing Research & Development team. This team builds and maintains exceptionally...  ...software, and development tools. We have incredibly large GPU and CPU compute clusters, larger than most national labs. We... 
    Full time
    Work at office
    Local area
    Immediate start
    Remote work
    Worldwide

    Hudson River Trading

    New York, NY
    13 hours ago
  •  ...Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE As an OS / K8s Systems Engineer at Baseten, you'll build the automation and systems that turn raw GPU hardware into production-ready compute. From... 
    Flexible hours

    Baseten

    New York, NY
    7 hours ago
  • $118.39k - $176k

     ...GlobalFoundries US, Inc. Position Title: MTS Equipment Engineering (Litho Track systems owner) Salary: $118,394- $176,000 Hours: Monday -...  ..., software bugs, and server and system issues across all fleets. Develop and train the maintenance technicians and new... 
    Fleet
    Local area
    Monday to Friday

    GLOBALFOUNDRIES

    New York, NY
    5 days ago
  • $100k - $135k

     ...operations and synthetic training environments to fleet sustainment, environmental remediation...  ..., a Division of HII is seeking a Systems Administrator 3 to support Modeling, Simulation...  ...processes for simulator technicians and engineers. Strong oral and written communication... 
    Fleet
    Full time
    Local area
    Worldwide

    HII Mission Technologies Division

    New York, NY
    15 hours ago
  • $140.83k - $166.22k

     ...Advanced Software Engineer - Revenue Systems Job ID: 14252 Business Unit: MTA Headquarters Location: New York, NY, United States Regular...  .... The MTA network comprises the nation’s largest bus fleet and more subway and commuter rail cars than all other U.S.... 
    Fleet
    Contract work
    Temporary work
    For contractors
    Work at office

    MTA, Inc.

    New York, NY
    3 days ago
  •  ...fal is looking for an Engineering Manager to lead their Fleet Reliability team in the United States. In this role, you will hire and develop personnel to ensure the reliability of GPU nodes. Responsibilities include setting SLAs, driving automation initiatives, and managing... 
    Fleet

    fal

    New York, NY
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to System Engineer, GPU Fleet. Be the first to apply!