Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Operations Engineer, Fleet Reliability

$83k - $110k
Full-time

CoreWeave

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at [

WHAT YOU'LL DO:

The Fleet Reliability Operations team is responsible for the day-to-day provisioning, management and uptime of CoreWeave’s ever-expanding fleet of server nodes. Playing a central role in CoreWeave’s growth strategy, this team is on the front line for configuration, updates and remote troubleshooting of our highest tier of supercomputing clusters and their networking, delivery platforms and tools dependencies. You will be in a daily battle with the forces of entropy to maximize the number of nodes CoreWeave can deliver to customers. We are seeking curious, creative and persistent problem solvers to join our Fleet Reliability Operations team to help us drive batches of server nodes through our provisioning and validation processes while efficiently and effectively troubleshooting node or cluster problems as they arise. This individual will join a team of committed engineers working to deploy nodes as fast as they can be racked and turned on. * Configure and maintain large-scale high-performance supercomputing clusters running state-of-the-art GPUs * Troubleshoot hardware and software issues; escalate and coordinate as needed with data center, network, hardware and platform teams to drive resolution * Monitor and analyze system performance and take appropriate remediation actions for cloud health * Approach your work with flexibility and optimism anticipating shifting business and technical priorities * Create and maintain documentation of team processes, knowledge and best practices for system management * Think critically about your day-to-day work and work collaboratively to improve team processes and efficiency * Participate in oncall rotations which include after hours and weekend work

WHO YOU ARE:

Minimum Qualifications
  • Strong understanding of Linux system administration and internals
  • Ability to troubleshoot hardware and software issues and perform system
maintenance tasks consistently and reliably * Software development or scripting languages (bash, python, powershell, etc) Preferred Qualifications * 2 + years of experience troubleshooting or administering data center or on-prem infrastructure (servers, storage, network or a mix)
  • Grafana, Prometheus, promsql queries or similar observability platforms
  • Data center environments including server racks, HVAC systems, fiber trays
  • Kubernetes administration
  • HPC - administering GPU-related workloads
  • Bachelor’s degree in a related field or equivalent experience
Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match.

WHY COREWEAVE?

At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:
  • Be Curious at Your Core
  • Act Like an Owner
  • Empower Employees
  • Deliver Best-in-Class Client Experiences
  • Achieve More Together
We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us! The base salary range for this role is $83,000 to $110,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility). What We Offer The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location. In addition to a competitive salary, we offer a variety of benefits to support your needs. The benefits below reflect our US-based offerings; for roles in other locations, benefits vary and are shared during the hiring process. These include:
  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
California Applicants California Consumer Privacy Act [ Equal Opportunity & Accommodations CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information. As part of this commitment and consistent with the Americans with Disabilities Act (ADA) [ CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: View email address on click.appcast.io [View email address on click.appcast.io]. Export Control Compliance This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Operations Engineer, Fleet Reliability in New York, NY vacancy
  • $83k - $110k

     ...CRWV) in March 2025. Learn more at What You'll Do: The Fleet Reliability Operations team is responsible for the day-to-day provisioning,...  ...they arise. This individual will join a team of committed engineers working to deploy nodes as fast as they can be racked and... 
    Fleet
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours
    Shift work

    CoreWeave

    New York, NY
    12 days ago
  • fal is looking for an Engineering Manager to lead their Fleet Reliability team in the United States. In this role, you will hire and develop personnel to ensure the reliability of GPU nodes. Responsibilities include setting SLAs, driving automation initiatives, and managing... 
    Fleet

    fal

    New York, NY
    3 days ago
  • $65k - $75k

     ...intelligence solutions that help retailers optimize operations, improve shelf execution, and deliver...  ...Simbe is seeking a Robot Operations Engineer to own the reliability and performance of our deployed Tally robot fleet. This is a frontline technical role, responsible... 
    Fleet
    Remote work

    Simbe Robotics

    New York, NY
    3 days ago
  • $150k - $250k

     ...and build data centers, and operate them - with teams spanning hardware...  ...a Principal Operations Engineer, Electrical to serve as the most...  ...for how a rapidly scaling fleet is operated, audited, and improved...  ..., maintainability, and reliability concerns are surfaced and addressed... 
    Fleet
    For contractors
    Local area
    Shift work

    Fluidstack

    New York, NY
    1 day ago
  • $150k - $250k

     ...and build data centers, and operate them - with teams spanning hardware...  ...a Principal Operations Engineer, Mechanical to serve as the most...  ...infrastructure across the fleet, including chillers, cooling...  ...operability, maintainability, and reliability concerns are surfaced and... 
    Fleet
    For contractors
    Local area
    Shift work

    Fluidstack

    New York, NY
    5 days ago
  • $86k - $95k

    Kiewit is seeking an Entry Level Equipment Engineer to support Regional Construction & Mining operations in various locations across the United States. This role...  ...on planning, budgeting, and improving equipment reliability while managing preventative maintenance programs.... 
    Fleet

    Kiewit

    New York, NY
    4 days ago
  •  ...market with ships, an extensive container fleet, intermodal and dedicated staff for its...  ...journey ahead of us. System Automation Engineer The Role & Key Responsibilities The Role...  ...Warren, New Jersey. Candidate MUST be able to reliably commute daily. Availability for... 
    Fleet
    Full time

    MSC Mediterranean Shipping Company SA

    New York, NY
    1 day ago
  • $65k - $75k

    Simbe Robotics is looking for a Robot Operations Engineer in the United States to oversee the reliability and performance of the Tally robot fleet. The role includes remote monitoring, issue diagnosis, and the development of automated detection tools. The ideal candidate... 
    Fleet
    Remote work

    Simbe Robotics

    New York, NY
    3 days ago
  • $165k - $180k

    Alpha Generation in New York, NY seeks a Senior Electrical Engineer to provide fleet-wide engineering support for electrical systems. The role emphasizes transformer reliability and performance management. Candidates should have at least 7 years of engineering experience... 
    Fleet

    Alpha Generation

    New York, NY
    3 days ago
  • $95k - $120k

     ...leading equipment rental firm in New York seeks a ProSolutions Regional Maintenance Manager to guide maintenance operations and enhance fleet reliability. This role involves developing standard processes, compliance oversight, and training staff to optimize maintenance... 
    Fleet

    Herc Rentals

    New York, NY
    3 days ago
  • $150k - $250k

    Fluidstack is seeking a Principal Operations Engineer to oversee electrical infrastructure across AI data centers. The ideal candidate will have extensive experience in mission-critical electrical systems, managing utility feeds and ensuring operational readiness for new... 
    Fleet

    Fluidstack

    New York, NY
    1 day ago
  • $211.37k - $253.64k

     ...experiences quickly, securely, and reliably by processing, serving, and...  ...May 29, 2026 Staff Software Engineer - Infrastructure Automation...  ..., Provisioning and Network Operations teams to help shape the processes...  ...expansion of the Fastly edge fleet Partner with software and... 
    Fleet
    Work at office
    Local area
    Remote work
    Flexible hours
    Night shift

    I did my part and supported the Regular Toilet

    New York, NY
    3 days ago
  • $150k - $250k

     ...of key problems the team is working on Operate at the scale of a nation, not a building...  ...authority for the operational hardware fleet across the hyperscale AI data center portfolio...  ..., and feed learnings back into hardware engineering, deployment, and supply chain as the... 
    Fleet
    Local area

    Fluidstack

    New York, NY
    3 days ago
  • Reliable Prescription Delivery Service Near You: Why EEC is Trusted Across Georgia You’re sick. You need meds. And the last thing you want...  ...same-day and STAT prescription deliveries across Georgia. Our fleet of trained medical couriers ensures your meds arrive on time,... 
    Fleet
    Summer work
    Local area

    Eec Logistics

    New York, NY
    3 days ago
  • $160k - $240k

    Senior Software Engineer - Trade Automation & Execution Reliability Location: New York Business Area: Engineering and CTO Ref #: 10049048 Description & Requirements...  ...electronic trading at Bloomberg. We design and operate high-performance, distributed, real-time systems... 
    Temporary work
    For contractors
    Work experience placement
    Worldwide

    Bloomberg L.P.

    New York, NY
    1 day ago
  • Titan America LLC is seeking an experienced Electrical Engineer for our Pennsuco Cement Plant in Medley, FL. The role involves maintaining and enhancing the reliability of the plant’s electrical and automation systems while minimizing maintenance costs. Qualified candidates... 

    Titan America LLC

    New York, NY
    2 days ago
  • $150k - $250k

     ...power, design and build data centers, and operate them - with teams spanning hardware and...  ...Role We are seeking a Principal Operations Engineer, Controls to serve as the most senior...  ...operated, maintained, and changed across the fleet. The ideal candidate has spent a career... 
    Fleet
    Local area
    Shift work

    Fluidstack

    New York, NY
    1 day ago
  •  ...medalists, and experienced engineering and product leaders with decades...  ..., we seek to improve our reliability dramatically while scaling the...  .... Design and implement key operational processes such as...  ...Familiarity with auto scaling, fleet management, and capacity planning... 
    Fleet

    Modal Labs

    New York, NY
    3 days ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within SRE that...  ...critical infrastructure and operational functions that support the broader...  ...alerting systems. The Fleet Management team provides the...  ...that ensure cluster reliability and security (e.g., CoreDNS,... 
    Fleet
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    New York, NY
    4 days ago
  •  ...very beneficial How to manage a large fleet/virtual machines/what building blocks...  ...protocols are a plus Job Description: Site Reliability Engineer Periodic updates and maintenance of...  ...such as ESXi or OpenStack AWS operations (EC2, S3, lambda, ELB) Windows administration... 
    Fleet
    Remote work
    Shift work

    TechDigital Group

    New York, NY
    1 day ago
  • $127k - $249k

     ...looking for an experienced Senior Engineer for our SRE, Atlas team to...  ...& build complex systems, operate with autonomy and act as owner...  ...essential maintenance of the Atlas fleet. This is an SRE team, which...  ...are seeking a talented Site Reliability Engineer (SRE) with a strong... 
    Fleet
    Local area
    Remote work

    MongoDB

    New York, NY
    4 days ago
  • A leading asset management firm in New York is seeking a Site Reliability Engineer to improve IT infrastructure. Responsibilities include ensuring uptime, automating processes, and collaborating with teams. Candidates should have experience with AWS, Docker, Kubernetes... 

    Point72 Asset Management, L.P

    New York, NY
    1 day ago
  • $111k - $160k

    Mizuho Financial Group Inc. is looking for a Site Reliability Engineer to maintain system reliability, scalability, and performance. This role involves automating workflows, monitoring system health, and collaborating with teams to enhance service reliability. The ideal... 

    Mizuho Financial Group Inc.

    New York, NY
    1 day ago
  • $123k - $165k

     ...Department/Group Overview Our engineering fleet is a horizontal set of teams providing engineering services across the organization. Our specific team provides reliability engineering and operational support to backend service development teams. Disney Entertainment... 
    Fleet

    The Walt Disney Company

    New York, NY
    20 hours ago
  •  ...Technical Operations Engineer DS creates systems that power the next generation of radio spectrum intelligence. We collect radio data from...  ...know-how to ensure our AI-driven sensing systems deliver reliable, actionable intelligence. You'll analyze why RF events occur... 
    Permanent employment
    Temporary work
    Work at office
    Remote work
    Flexible hours

    Distributed Spectrum

    New York, NY
    2 days ago
  • $130k - $170k

     ...want to do the best work of your career at the frontier of AI, come build it with us. We are hiring a Technical Operations Engineer to own the reliability, security, and scalability of our identity, endpoint, and IT infrastructure systems. You will work directly with... 
    Work at office
    Remote work
    Shift work

    Profound

    New York, NY
    3 days ago
  • Capital Resin Corporation is looking for a Production Engineer to support our Columbus, Ohio and Detroit, Michigan facilities. This role...  ...expertise, troubleshooting process issues, and ensuring safe operations. The ideal candidate will have 5-10+ years of chemical plant... 

    6AM City, LLC

    Brooklyn, NY
    5 days ago
  • RAMP is seeking a Production Engineer to build and operate critical infrastructure. You will drive architectural changes, partner with product teams on design, and ensure the systems handle financial transactions at scale. Ideal candidates have 2+ years in software engineering... 
    Relocation package
    Flexible hours

    RAMP

    New York, NY
    1 day ago
  • $135k - $160k

     ...and motivated customer facing DevSecOps Engineer to join the Customer Experience team. In...  ...We are seeking professionals who want to operate on the front lines of an exciting and...  ...and contribute to ongoing improvements in reliability, performance, and security. You’ll have... 
    Full time
    Contract work
    Work at office
    Flexible hours

    Second Front

    New York, NY
    3 days ago
  •  ...rely on. Hiring globally. Sales and GTM • New York City • Full time • On-site Multi-channel messaging that's fast, simple, and reliable. Infrastructure for developers who need it to work. Behind every number is a win. Let our stats speak. $5.8M 550M+ Contacts Reached... 
    Full time
    Home office

    Cloudgraph Inc.

    New York, NY
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Operations Engineer, Fleet Reliability. Be the first to apply!