System Engineer, GPU Fleet
$200k - $300kFluidstack
System Engineer, GPU Fleet
As a System Engineer, GPU Fleet, you will manage, operate, and optimize hyperscale GPU compute infrastructure supporting AI/ML training and inference workloads. Ensure high availability, performance, and reliability of GPU server fleet through automation, monitoring, troubleshooting, and collaboration with hardware engineering, platform teams, and datacenter operations.
Focus
- Operate and maintain large-scale GPU server fleet (H100, B200, GB200) supporting AI/ML workloads; monitor system health, performance, and utilization to maximize uptime and ensure SLA compliance
- Perform hands-on troubleshooting and root cause analysis of complex hardware, firmware, OS, and application issues across GPU clusters; coordinate with vendors and hardware teams to resolve systemic failures
- Develop and maintain automation scripts for provisioning, configuration management, monitoring, and remediation at scale.
- Build and improve tooling for GPU health checks, performance diagnostics, driver validation, and automated recovery
- Execute server provisioning, configuration, firmware updates, and OS installation using automation frameworks; manage lifecycle operations including deployment, maintenance, and decommissioning
- Participate in 24x7 on-call rotation; respond to production incidents and coordinate resolution with cross-functional teams including datacenter operations, network engineering, and application teams
- Lead post-incident reviews, document root causes, and drive continuous improvement initiatives focused on automation, reliability, monitoring, and operational efficiency
Basic Qualifications
- Bachelor's degree in Computer Science, Engineering, or related technical field (or equivalent practical experience)
- 3+ years (System Engineer) or 5+ years (Senior System Engineer) in Linux system administration, datacenter operations, or infrastructure engineering
- Strong Linux/Unix fundamentals including system administration, shell scripting (Bash, Python), troubleshooting, and performance tuning
- Experience with server hardware architecture, troubleshooting techniques, and understanding of compute, memory, storage, and networking components
- Experience in automation and configuration management tools (Ansible, Puppet, Chef, Terraform).
- Strong analytical and problem-solving skills with ability to diagnose complex technical issues under pressure
- Excellent communication and collaboration skills; ability to work effectively with cross-functional teams
Preferred Qualifications
- Experience managing large-scale GPU infrastructure (NVIDIA H100, A100, B200, GB200) in production environments supporting AI/ML workloads
- Deep knowledge of GPU architecture, CUDA toolkit, GPU drivers, monitoring tools (nvidia-smi, DCGM)
- Experience with HPC cluster management, job schedulers (Slurm, PBS, LSF), and container orchestration (Kubernetes, Docker)
- Proficiency in out-of-band management protocols (IPMI, Redfish, BMC) and firmware management for server hardware
- Experience with high-performance networking (InfiniBand, RoCE, RDMA) and network troubleshooting in GPU cluster environments
- Familiarity with datacenter operations including rack installations, cabling, power management, and thermal considerations
Salary & Benefits
- Competitive total compensation package (salary + equity).
- Retirement or pension plan, in line with local norms.
- Health, dental, and vision insurance.
- Generous PTO policy, in line with local norms.
The base salary range for this position is $200,000 - $300,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.
We are committed to pay equity and transparency.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
$147.05k - $230.85k
...Expert Systems Engineer, Applied AI & On-Device Technology (Windows / PC Automation) This role... ...security, and hardware constraints (CPU/GPU/NPU, memory, power). PC Automation... ...configure, control, evaluate, and manage large fleets of Windows PCs. Automate complex,...FleetTemporary workWork at officeLocal areaFlexible hours- ...consulting services. We are in search of a highly motivated candidate to join our talented Team. Job Title: System Engineer Datacenter GPU Location(s): Austin, TX Client is looking for a System Engineer Datacenter GPU to work in IPP (...SuggestedWorldwide
- ...data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and... ...career. THE TEAM: AMD's Data Center GPU organization is transforming the industry... ...AMD is looking for a lead systems engineer to provide thought leadership and subject...Suggested
$104.5k - $160k
...Our team is part of the Enterprise Engineering organization and manages one of the largest Windows client fleets in the world. Our focus is to consistently raise Amazon'... ...services at a large scale. We are looking for a Systems Engineer with a Windows-focused infrastructure...FleetWork experience placementFlexible hours- ...Linux Systems Engineer Imagine what you could do here. The people here at Apple don't just create products — they create the kind of wonders... ...Join our Infrastructure Systems Engineering team within the Fleet Operations Engineering organization, where we design, build,...Fleet
- ...Database Systems Engineer Imagine what you could do here. The people here at Apple don't just create products — they create the kind of... ...Description Join our Database Systems Engineering team within the Fleet Operation Engineering Group — the people responsible for...Fleet
$97.5k - $199.5k
...Job Description Electrical Engineer - Power Systems Engineer Role Summary Join a team of exceptional engineers as a motivated Electrical... ..., on-site generation, and energy storage across the fleet. Qualifications ~ Bachelor's degree in electrical ~1...FleetTemporary workFlexible hours- ...Insight Global is now looking for a motivated Engineering Technician for one of our semiconductor... ...and maintaining a compute farm of systems which includes Builders, Packagers, and... ...• Manage and maintain a high-performing fleet of builders, packagers, testers, and core...FleetContract workWorldwide
- ...Senior Electrical Engineer Saronic Technologies is a leader in revolutionizing autonomy... ...positioning technologies for autonomous marine systems. This individual will be responsible for... ...sensing electronics — across Saronic's fleet of autonomous vessels. This is a hands...FleetPermanent employmentTemporary workWork at office
- ...Senior Software Engineer - Fleet Software Management System Austin, TX About the Team We are a fast-moving team of infrastructure and platform engineers building Fleet Orchestrator, which manages our entire fleet state, including software versions and configuration...FleetRemote workRelocation
- ...data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and... ...AMD is looking for a seasoned systems engineering professional with a strong hardware foundation... ...platforms, networking, storage, and AI GPU solutions. You will collaborate closely with...Remote work
- ...'s fragile, centralized grid into a resilient and abundant system. We are engineers, operators, and creatives solving some of the most complex,... ...equipment, field deployments, and distributed energy resource fleet. As a Data Engineer working with our Hardware team, you...FleetShift work
- ...Saronic Technologies is seeking a Senior Electrical Engineer specializing in ruggedized computing and networking systems to join our Electrical Engineering – Advanced... ...- supporting both technology evaluation and fleet deployment. Key Responsibilities Lead R...FleetPermanent employmentTemporary workWork at officeRemote work
- ...Apple Business Systems Engineer Manager The people here at Apple don't just create products — they create the kind that's revolutionized entire industries! It's the diversity of those people and their ideas that inspires the innovation that runs through everything...Remote work
$68k - $75k
...of Virginia's Best Places to Work, is looking for a Weapon System Test Engineer to join our team in Dahlgren to support testing of the Tactical... ...issues Preferred Skills and Experience: Prior US Navy fleet experience as an E-4 or above Fire Controlman (FC) or Fire...FleetFull timeContract workTemporary workCasual workWork at officeFlexible hours$142.6k - $261.5k
...team of product leaders, data scientists, designers, and software engineers enable our clients to solve their most complex product... ...of quality assurance and testing practices. Knowledgeable in system development lifecycle and technology integration. To qualify...Summer holidayFlexible hours- ...last eight years developing and deploying autonomous logistics systems for real-world operations. Our V2 aircraft carries 30 lbs... ...full-rate production and scaling toward large autonomous cargo fleets. Backed by Y Combinator and a $37M AFWERX STRATFI award from...FleetPermanent employmentContract workLocal area
- ...Job Description: JOB DESCRIPTION: The system engineer is the owner of engineering requirement translation, specification and implementation of projects across entire system design for products. Responsible for implementation and design of custom software/...Work at office
- ...What to Expect Tesla is seeking a highly motivated Engineer to develop functional test equipment (such as dynamometers, electrical testers... ...vendors to develop and deploy new Drive Unit and Actuator test systems. You will assume full ownership of the design, development, and...Hourly payFull timeTemporary workFlexible hours
$80.31 - $85.31 per hour
...the status quo" and transform the finance industry together. Join us for significant technical transformation in Broker-Dealer Systems and the modernization of our core technology infrastructure. As a Senior AI Developer, this role will be a leader in AI focused workflows...Hourly payContract workTemporary workWork experience placement$54.4k - $57.99k
...traditional call center responsibilities, requiring strong analytical skills, attention to detail, and the ability to work across multiple systems and processes. Maintains end-to-end responsibility for customer’s support needs providing timely, reliable, and courteous...Contract workWork at office- ...Role: System Engineer Preferred Location: Onsite (Austin) Key responsibilities: 1. Dashboard Development and Maintenance: - Design and implement monitoring dashboards for SAP HANA and SAP NetWeaver using Splunk and Grafana. - Create custom visualizations to...
- ...Job Title High Performance Computing Systems Engineer Visa: USC, GC or GC-EAD Duration: 9 months with potential extension Location: Onsite in Austin, TX They'll give preference to someone who is currently local to Austin and then will consider people willing...Work experience placementLocal areaRelocation
- ...the world moves earth for construction. Founded by former SpaceX engineers and backed by Bain Capital Ventures, TerraFirma is automating... ...Overview In this role, you'll take hands-on ownership of various systems spanning hardware and software, and analog and digital domains....WorldwideRelocationWeekend work
- ...Applifecycle Systems Engineer - UCCE and UCM Location: RTP, NC / Austin, TX / San Jose, CA Duration: Fulltime Job Description: Skills Desired: Designing, Managing Cisco Unified Contact Center Enterprise technologies (UCCE), Cisco Customer Voice Portal...Full timeWork experience placement
- ...development, and implementation of enterprise-wide Operations Support Systems (OSS) applications and their associated operating systems and... ...Divisions. Will participate in full life-cycle of systems engineering activities of high-quality, scalable solutions. Actively...Work experience placementWork at officeLocal areaVisa sponsorship
- ...An Amazing Career Opportunity for a Senior Systems Engineer!! Location: Austin, TX Job ID: 45288 The Senior Systems Engineer is a key member of the Platform Services Team, responsible for designing, operating, securing, and sustaining enterprise infrastructure...Job sharingPart timeWorldwideFlexible hours
$159.2k - $215.3k
...that will provide low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. As a Systems Engineer, this role is primarily responsible for the design, development and integration of communication payload and customer terminal...Permanent employmentLocal areaFlexible hours$141.3k - $211.9k
...Job Title 3GPP System Engineer Company Qualcomm Technologies, Inc. Job Area Engineering Group, Engineering Group Technical Standards Engineering General Summary Qualcomm is seeking a 3GPP System Engineer with strong experience in wireless system design...Work experience placementRemote workWork from home- ...we are developers and pioneers of out-of-the-box communication systems for satellites, UAVs, launch vehicles, and other space and airborne... ...individuals to join our team. In this role, a systems engineer is responsible for utilizing commercial modeling and simulation...Permanent employmentFull timeContract workWork experience placementLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to System Engineer, GPU Fleet. Be the first to apply!
- operations support system engineer Austin, TX
- microsoft systems engineer Austin, TX
- ground systems engineer Austin, TX
- mission system engineer Austin, TX
- unix linux systems engineer Austin, TX
- wireless systems engineer Austin, TX
- space systems engineer Austin, TX
- director systems engineering Austin, TX
- digital communications systems engineer Austin, TX
- application system engineer Austin, TX


