Operations Engineer, Fleet Reliability
$83k - $110kCoreWeave
Job Description
Job Description
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at
What You'll Do:The Fleet Reliability Operations team is responsible for the day-to-day provisioning, management and uptime of CoreWeave's ever-expanding fleet of server nodes. Playing a central role in CoreWeave's growth strategy, this team is on the front line for configuration, updates and remote troubleshooting of our highest tier of supercomputing clusters and their networking, delivery platforms and tools dependencies. You will be in a daily battle with the forces of entropy to maximize the number of nodes CoreWeave can deliver to customers.
We are seeking curious, creative and persistent problem solvers to join our Fleet Reliability Operations team to help us drive batches of server nodes through our provisioning and validation processes while efficiently and effectively troubleshooting node or cluster problems as they arise. This individual will join a team of committed engineers working to deploy nodes as fast as they can be racked and turned on.
- Configure and maintain large-scale high-performance supercomputing clusters running state-of-the-art GPUs
- Troubleshoot hardware and software issues; escalate and coordinate as needed with data center, network, hardware and platform teams to drive resolution
- Monitor and analyze system performance and take appropriate remediation actions for cloud health
- Approach your work with flexibility and optimism anticipating shifting business and technical priorities
- Create and maintain documentation of team processes, knowledge and best practices for system management
- Think critically about your day-to-day work and work collaboratively to improve team processes and efficiency
- Participate in oncall rotations which include after hours and weekend work
Minimum Qualifications
- Strong understanding of Linux system administration and internals
- Ability to troubleshoot hardware and software issues and perform system maintenance tasks consistently and reliably
- Software development or scripting languages (bash, python, powershell, etc)
Preferred Qualifications
- 2 + years of experience troubleshooting or administering data center or on-prem infrastructure (servers, storage, network or a mix)
- Grafana, Prometheus, promsql queries or similar observability platforms
- Data center environments including server racks, HVAC systems, fiber trays
- Kubernetes administration
- HPC - administering GPU-related workloads
- Bachelor's degree in a related field or equivalent experience
Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match.
Why CoreWeave?At CoreWeave, we work hard, have fun, and move fast! We're in an exciting stage of hyper-growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:
- Be Curious at Your Core
- Act Like an Owner
- Empower Employees
- Deliver Best-in-Class Client Experiences
- Achieve More Together
We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!
The base salary range for this role is $83,000 to $110,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
What We Offer
The range we've posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.
In addition to a competitive salary, we offer a variety of benefits to support your needs. The benefits below reflect our US-based offerings; for roles in other locations, benefits vary and are shared during the hiring process. These include:
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Ability to Participate in Employee Stock Purchase Program (ESPP)
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption
California Applicants
California Consumer Privacy Act
Equal Opportunity & Accommodations
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
As part of this commitment and consistent with the Americans with Disabilities Act (ADA) , CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: View email address on ziprecruiter.com.
Export Control Compliance
This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.
$143k - $210k
...CRWV) in March 2025. Learn more at What You'll do The Fleet Reliability Operations Team is the heart of CoreWeave's capacity delivery and... ...process-oriented, reliability and observability-focused engineers. Lead the socialization and documentation of clear and...FleetPermanent employmentTemporary workCasual workWork at officeFlexible hours$143k - $210k
CoreWeave seeks a Manager for its Fleet Reliability Operations Team in Bellevue, WA. This role involves building and leading a 24/7 team responsible... ...should have 7+ years in software or infrastructure engineering with leadership experience, alongside a strong background...FleetFlexible hours$180.2k - $243.34k
...achieve this by building and operating the world's best data and AI... ...Program Manager (TPM) for Reliability to lead the strategy, execution... ...infrastructure and product engineering teams at Databricks. As... ...Platform Engineering, Compute Fleet Management, SRE, Security, and...FleetLocal areaWorldwide$90k - $105k
Simbe Robotics is seeking a Senior Fleet Operations Engineer in Seattle, WA. The role involves hands-on electromechanical repairs and remote monitoring of robot fleets. Candidates should have over 5 years of experience in complex systems repair and strong troubleshooting...FleetRemote job- ...Job Title: Nuclear Engineer (Naval Reactors Engineer) Category / Component: Officer •... ...propulsion program, including reactor design, fleet operations, and eventual defueling and... ...coordinate with fleet units to ensure safe and reliable nuclear plant operation. What to Expect...FleetApprenticeshipWork at office
$90k - $105k
...store intelligence solutions that help retailers optimize operations, improve shelf execution, and deliver valuable data insights... ...retail environments around the clock - and the Senior Fleet Operations Engineer is the technical force keeping them that way. This role sits...FleetLocal areaRemote workWorldwideWeekend work$82.97k - $110.63k
...building the future. The Role Senior Engineer position requires a high degree of... ...includes identifying ways to improve Network Reliability and Customer Satisfaction with... ...simplify, standardize, and automate network operations by leveraging AI-driven insights and...Full timeTemporary workWork at officeRemote workNight shift$124k - $155k
...customers sending money globally, providing secure, simple, and reliable ways to manage their money, ensuring true peace of mind.... ...borders. About the Role: As an Autonomous Systems Engineer, you orchestrate a fleet of AI agents to architect, build, and deploy production-...FleetWork at officeWorldwideFlexible hours3 days per week$153k - $242k
...Senior Systems Engineer, OS Automation CoreWeave is The Essential Cloud for AI™. Built... ...Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kernel)... ...OS image build pipelines for our massive fleet of GPU-accelerated servers. ~ Kernel Distribution...FleetPermanent employmentTemporary workCasual workWork at officeLocal areaRemote workFlexible hours$143.7k - $194.4k
...your voice matters. Ready to engineer the backbone of tomorrow's internet... ...and invent new ways of operating Amazon's network. This role will... ...efficiency, performance and reliability to allow customers access to... ...About the team The FAROS (Fleet Automation and Release...FleetInternshipFlexible hours$106.4k - $177.3k
...to join us as a Senior Test Automation Engineer! About the role As a Senior Test Automation... ...testing. Design, build, and document reliable for yourself and other developers.... ...'re buying, and we design products-and operate our company-to stand the test of time....Full timeImmediate startRemote workWork from homeFlexible hours- ...position is intended for highly skilled, experienced Test Automation Engineers with advanced knowledge of automated testing frameworks, tools... ...UI, and integration testing. • Design, build, and document reliable test and debug tools for yourself and other developers. •...Immediate start
$161.5k - $233.45k
...Boeing Company is looking for a Cloud Reliability Manager to join the team in... ...organization, owning Runtime Site Reliability Engineer (SRE) and Cloud Operations. You will be accountable for the... ...services and multi-cluster fleets Convert Root Cause Analysis (RCA...FleetPermanent employmentRelocationVisa sponsorshipWork visaRelocation packageFlexible hoursShift workDay shift- ...motivated by our mission and operating principles. You move fast... ...backbone of the company’s global engineering organization. We build and... ...who are passionate about reliability, automation, and engineering... ...across Airwallex’s database fleet. Drive infrastructure reliability...FleetWorldwide
$105k - $120k
...Mars. ELECTRICAL ON-ORBIT HARDWARE RELIABILITY ENGINEER, SATELLITES (STARLINK) SpaceX is leveraging... ...worldwide. We design, build, test, and operate all parts of the system – thousands of... ...modes on the largest space vehicle fleet in history. Your daily work will include...FleetPermanent employmentTemporary workInternshipWork at officeWorldwideMonday to FridayWeekend work$139.5k - $258.1k
...and Services Apple Services Engineering team is one of the most exciting... ...team, as a Site Reliability Engineer, to help support and... ...infrastructural services. These services operate at extremely large scale and... ...features. We are domain experts in fleet management, systems, and...FleetRelocation$156.8k - $219.52k
...Future of Space Mobility and Astrodynamics Operations Blue Origin, and particularly the In‑... ...methods. As an Astrodynamics Operations Engineer in the Astrodynamics and Trajectory... ...roadmap and build out our processes toward fleet‑level automation. You will ensure that the...FleetPermanent employmentTemporary workLocal area$143.7k - $194.4k
...mission is to deliver fast, reliable internet to customers and communities... ..., and other organizations operating in locations without reliable... ...a Software Development Engineer who will help solve a variety... ...manufacturing of the Leo satellite fleet and its supporting ground...FleetContract workInternshipFlexible hours- Dart Solutions in Bellevue, Washington, is seeking an Apprentice Fleet Technician to assist Fleet Technicians in providing safe and reliable vehicles. You will learn preventive maintenance inspections and repairs, ensuring vehicle reliability and efficiency. This role requires...FleetApprenticeship
$108k - $123.6k
...leaders are choosing Mainspring over traditional options like engines, turbines, and fuel cells to quickly and reliably deliver local power for EV charging, commercial facilities, data centers, and grid‑scale operations. The Mainspring Linear Generator is fuel flexible,...Local areaFlexible hours$85k - $115k
...adoption reimbursement, paid parental and family caregiver leave. ~ Fleet vehicle program (restrictions apply) and mileage reimbursement.... ...work. Current CPR certification. Valid driver's license, reliable transportation and liability insurance. Note - If less...FleetFull timeInternshipMonday to FridayShift workWeekend work$120k - $150k
...mega‑campus vertical development strategy called Fleet Data Centers. Senior Availability / Reliability Engineer Leads availability modeling, reliability... ...with engineering, construction, commissioning, and operations to identify risks early, define mitigations, and...FleetWork at officeFlexible hours$106.61k - $284.28k
Hispanic Alliance for Career Enhancement is seeking a Staff Software Development Engineer in Automation Production Support. This senior role is crucial for the stability and reliability of enterprise automation solutions. Responsibilities include leading incident resolution...$120k - $150k
A leading investment firm is looking for a Senior Availability / Reliability Engineer in Seattle. The role involves leading availability modeling and reliability analysis for behind-the-meter power solutions. Applicants should have a Bachelor’s degree in Engineering and...$172.5k - $260.1k
...efforts. Job Category Software Engineering Job Details About Salesforce Salesforce... ...agile team with deep startup roots. We operate as a high-velocity 'startup-within-... ...technical lead within our embedded reliability team. You aren't building the...Shift work$190k - $282k
...Senior Security Production Engineer Livingston, NJ / New York, NY / Sunnyvale, CA /... ...cloud platform. This team ensures the reliability, performance, and resilience of security... ...footprint, enabling safe and efficient operations for enterprise and AI workloads at scale...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$74.1k - $81.9k
...approach that begins with innovative planning and extends throughout the life of the property. Position Description: The IT Operations Engineer I role is to support the stable operation of the operational region's network and end user devices. This includes assisting...Contract workFor contractorsWork experience placementWork at officeLocal area$105k - $120k
...Electrical Superintendent in Seattle, responsible for the safe operation of electrical systems on vessels. You will support maintenance... .... The ideal candidate has a Bachelor's degree in Electrical Engineering and experience in maritime environments. Additional requirements...Fleet- ...enabling human life on Mars. HARDWARE RELIABILITY ENGINEER (STARLINK AVIATION) At SpaceX we’re leveraging... ...worldwide. We design, build, test, and operate all parts of the system - thousands of... ..., manufacturing, test, and in the fleet. Responsibilities Design and execute...FleetPermanent employmentTemporary workInternshipImmediate startWorldwideWeekend work
$160k - $200k
...Infrastructure Operations Engineer Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform... ...platform. Our InfraOps team sits at the center of reliability, automation, and operational scale for GPU infrastructure....Remote workWork from homeFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Operations Engineer, Fleet Reliability. Be the first to apply!
- production operations engineer Bellevue, WA
- post production engineer Bellevue, WA
- remote operation drilling engineer Bellevue, WA
- security operations center engineer Bellevue, WA
- operations engineer Bellevue, WA
- production network engineer Bellevue, WA
- data center operations engineer Bellevue, WA
- network operations center engineer Bellevue, WA
- senior production engineer Bellevue, WA
- application operations engineer Bellevue, WA



