Operations Engineer, Fleet Reliability
$83k - $110kCoreWeave
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at [
WHAT YOU'LL DO:
The Fleet Reliability Operations team is responsible for the day-to-day provisioning, management and uptime of CoreWeave’s ever-expanding fleet of server nodes. Playing a central role in CoreWeave’s growth strategy, this team is on the front line for configuration, updates and remote troubleshooting of our highest tier of supercomputing clusters and their networking, delivery platforms and tools dependencies. You will be in a daily battle with the forces of entropy to maximize the number of nodes CoreWeave can deliver to customers. We are seeking curious, creative and persistent problem solvers to join our Fleet Reliability Operations team to help us drive batches of server nodes through our provisioning and validation processes while efficiently and effectively troubleshooting node or cluster problems as they arise. This individual will join a team of committed engineers working to deploy nodes as fast as they can be racked and turned on. * Configure and maintain large-scale high-performance supercomputing clusters running state-of-the-art GPUs * Troubleshoot hardware and software issues; escalate and coordinate as needed with data center, network, hardware and platform teams to drive resolution * Monitor and analyze system performance and take appropriate remediation actions for cloud health * Approach your work with flexibility and optimism anticipating shifting business and technical priorities * Create and maintain documentation of team processes, knowledge and best practices for system management * Think critically about your day-to-day work and work collaboratively to improve team processes and efficiency * Participate in oncall rotations which include after hours and weekend workWHO YOU ARE:
Minimum Qualifications- Strong understanding of Linux system administration and internals
- Ability to troubleshoot hardware and software issues and perform system
- Grafana, Prometheus, promsql queries or similar observability platforms
- Data center environments including server racks, HVAC systems, fiber trays
- Kubernetes administration
- HPC - administering GPU-related workloads
- Bachelor’s degree in a related field or equivalent experience
WHY COREWEAVE?
At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:- Be Curious at Your Core
- Act Like an Owner
- Empower Employees
- Deliver Best-in-Class Client Experiences
- Achieve More Together
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Ability to Participate in Employee Stock Purchase Program (ESPP)
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption
$83k - $110k
...CRWV) in March 2025. Learn more at What You'll Do: The Fleet Reliability Operations team is responsible for the day-to-day provisioning,... ...they arise. This individual will join a team of committed engineers working to deploy nodes as fast as they can be racked and...FleetPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work- fal is looking for an Engineering Manager to lead their Fleet Reliability team in the United States. In this role, you will hire and develop personnel to ensure the reliability of GPU nodes. Responsibilities include setting SLAs, driving automation initiatives, and managing...Fleet
$65k - $75k
...intelligence solutions that help retailers optimize operations, improve shelf execution, and deliver... ...Simbe is seeking a Robot Operations Engineer to own the reliability and performance of our deployed Tally robot fleet. This is a frontline technical role, responsible...FleetRemote work$150k - $250k
...and build data centers, and operate them - with teams spanning hardware... ...a Principal Operations Engineer, Electrical to serve as the most... ...for how a rapidly scaling fleet is operated, audited, and improved... ..., maintainability, and reliability concerns are surfaced and addressed...FleetFor contractorsLocal areaShift work$150k - $250k
...and build data centers, and operate them - with teams spanning hardware... ...a Principal Operations Engineer, Mechanical to serve as the most... ...infrastructure across the fleet, including chillers, cooling... ...operability, maintainability, and reliability concerns are surfaced and...FleetFor contractorsLocal areaShift work$86k - $95k
Kiewit is seeking an Entry Level Equipment Engineer to support Regional Construction & Mining operations in various locations across the United States. This role... ...on planning, budgeting, and improving equipment reliability while managing preventative maintenance programs....Fleet- ...market with ships, an extensive container fleet, intermodal and dedicated staff for its... ...journey ahead of us. System Automation Engineer The Role & Key Responsibilities The Role... ...Warren, New Jersey. Candidate MUST be able to reliably commute daily. Availability for...FleetFull time
$65k - $75k
Simbe Robotics is looking for a Robot Operations Engineer in the United States to oversee the reliability and performance of the Tally robot fleet. The role includes remote monitoring, issue diagnosis, and the development of automated detection tools. The ideal candidate...FleetRemote work$165k - $180k
Alpha Generation in New York, NY seeks a Senior Electrical Engineer to provide fleet-wide engineering support for electrical systems. The role emphasizes transformer reliability and performance management. Candidates should have at least 7 years of engineering experience...Fleet$95k - $120k
...leading equipment rental firm in New York seeks a ProSolutions Regional Maintenance Manager to guide maintenance operations and enhance fleet reliability. This role involves developing standard processes, compliance oversight, and training staff to optimize maintenance...Fleet$150k - $250k
Fluidstack is seeking a Principal Operations Engineer to oversee electrical infrastructure across AI data centers. The ideal candidate will have extensive experience in mission-critical electrical systems, managing utility feeds and ensuring operational readiness for new...Fleet$211.37k - $253.64k
...experiences quickly, securely, and reliably by processing, serving, and... ...May 29, 2026 Staff Software Engineer - Infrastructure Automation... ..., Provisioning and Network Operations teams to help shape the processes... ...expansion of the Fastly edge fleet Partner with software and...FleetWork at officeLocal areaRemote workFlexible hoursNight shift$150k - $250k
...of key problems the team is working on Operate at the scale of a nation, not a building... ...authority for the operational hardware fleet across the hyperscale AI data center portfolio... ..., and feed learnings back into hardware engineering, deployment, and supply chain as the...FleetLocal area- Reliable Prescription Delivery Service Near You: Why EEC is Trusted Across Georgia You’re sick. You need meds. And the last thing you want... ...same-day and STAT prescription deliveries across Georgia. Our fleet of trained medical couriers ensures your meds arrive on time,...FleetSummer workLocal area
$160k - $240k
Senior Software Engineer - Trade Automation & Execution Reliability Location: New York Business Area: Engineering and CTO Ref #: 10049048 Description & Requirements... ...electronic trading at Bloomberg. We design and operate high-performance, distributed, real-time systems...Temporary workFor contractorsWork experience placementWorldwide- Titan America LLC is seeking an experienced Electrical Engineer for our Pennsuco Cement Plant in Medley, FL. The role involves maintaining and enhancing the reliability of the plant’s electrical and automation systems while minimizing maintenance costs. Qualified candidates...
$150k - $250k
...power, design and build data centers, and operate them - with teams spanning hardware and... ...Role We are seeking a Principal Operations Engineer, Controls to serve as the most senior... ...operated, maintained, and changed across the fleet. The ideal candidate has spent a career...FleetLocal areaShift work- ...medalists, and experienced engineering and product leaders with decades... ..., we seek to improve our reliability dramatically while scaling the... .... Design and implement key operational processes such as... ...Familiarity with auto scaling, fleet management, and capacity planning...Fleet
$127k - $249k
...The Team Platform Engineering is the department within SRE that... ...critical infrastructure and operational functions that support the broader... ...alerting systems. The Fleet Management team provides the... ...that ensure cluster reliability and security (e.g., CoreDNS,...FleetWork at officeLocal areaRemote workWorldwideFlexible hours- ...very beneficial How to manage a large fleet/virtual machines/what building blocks... ...protocols are a plus Job Description: Site Reliability Engineer Periodic updates and maintenance of... ...such as ESXi or OpenStack AWS operations (EC2, S3, lambda, ELB) Windows administration...FleetRemote workShift work
$127k - $249k
...looking for an experienced Senior Engineer for our SRE, Atlas team to... ...& build complex systems, operate with autonomy and act as owner... ...essential maintenance of the Atlas fleet. This is an SRE team, which... ...are seeking a talented Site Reliability Engineer (SRE) with a strong...FleetLocal areaRemote work- A leading asset management firm in New York is seeking a Site Reliability Engineer to improve IT infrastructure. Responsibilities include ensuring uptime, automating processes, and collaborating with teams. Candidates should have experience with AWS, Docker, Kubernetes...
$111k - $160k
Mizuho Financial Group Inc. is looking for a Site Reliability Engineer to maintain system reliability, scalability, and performance. This role involves automating workflows, monitoring system health, and collaborating with teams to enhance service reliability. The ideal...$123k - $165k
...Department/Group Overview Our engineering fleet is a horizontal set of teams providing engineering services across the organization. Our specific team provides reliability engineering and operational support to backend service development teams. Disney Entertainment...Fleet- ...Technical Operations Engineer DS creates systems that power the next generation of radio spectrum intelligence. We collect radio data from... ...know-how to ensure our AI-driven sensing systems deliver reliable, actionable intelligence. You'll analyze why RF events occur...Permanent employmentTemporary workWork at officeRemote workFlexible hours
$130k - $170k
...want to do the best work of your career at the frontier of AI, come build it with us. We are hiring a Technical Operations Engineer to own the reliability, security, and scalability of our identity, endpoint, and IT infrastructure systems. You will work directly with...Work at officeRemote workShift work- Capital Resin Corporation is looking for a Production Engineer to support our Columbus, Ohio and Detroit, Michigan facilities. This role... ...expertise, troubleshooting process issues, and ensuring safe operations. The ideal candidate will have 5-10+ years of chemical plant...
- RAMP is seeking a Production Engineer to build and operate critical infrastructure. You will drive architectural changes, partner with product teams on design, and ensure the systems handle financial transactions at scale. Ideal candidates have 2+ years in software engineering...Relocation packageFlexible hours
$135k - $160k
...and motivated customer facing DevSecOps Engineer to join the Customer Experience team. In... ...We are seeking professionals who want to operate on the front lines of an exciting and... ...and contribute to ongoing improvements in reliability, performance, and security. You’ll have...Full timeContract workWork at officeFlexible hours- ...rely on. Hiring globally. Sales and GTM • New York City • Full time • On-site Multi-channel messaging that's fast, simple, and reliable. Infrastructure for developers who need it to work. Behind every number is a win. Let our stats speak. $5.8M 550M+ Contacts Reached...Full timeHome office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Operations Engineer, Fleet Reliability. Be the first to apply!
- operations engineer intern New York, NY
- production operations engineer New York, NY
- application operations engineer New York, NY
- data center operations engineer New York, NY
- remote operation drilling engineer New York, NY
- security operations center engineer New York, NY
- cloud operations engineer New York, NY
- production support engineer New York, NY
- production network engineer New York, NY
- senior security operations engineer New York, NY


