Senior Production Engineer (Reliability)
$182k - $242kCoreWeave
Job Description
Job Description
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at
About the RoleProduction Engineering ensures CoreWeave's cloud delivers world-class reliability, performance, and operational excellence. We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical tooling that drives reliability and delivery success.
In this role, you will work broadly across the cloud stack designing, implementing, deploying, and operating systems that improve delivery velocity, service availability, and operational safety. You'll be responsible for leading end-to-end technical projects, maintaining long-lived systems the team owns, and strengthening our operational foundations through durable engineering investments.
This is a role for someone who enjoys building , debugging , and operating production systems. You will collaborate closely with service owners, but your primary impact comes from the reliability, quality, and maturity of the systems you deliver and maintain over time.
What You'll Do- Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.
- Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery.
- Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support.
- Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes.
- Improve runbooks, sources of truth, deployment workflows, and operational tooling to harden production readiness.
- Eliminate single points of failure and reduce operational toil through automation, refactors, and system redesigns.
- Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.
- Maintain and mature long-term projects and frameworks owned by the team, ensuring they remain reliable, well-instrumented, and easy to operate.
- Collaborate with platform teams to ensure new features and services integrate cleanly with our reliability best-practices and tooling.
- 7+ years of engineering experience building and operating distributed systems or cloud platforms.
- Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation.
- Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools.
- Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes.
- Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices.
- A track record of successfully delivering hands-on reliability improvements through engineering execution.
- Experience building internal tooling, frameworks, or automation that supports high-availability cloud operations.
- Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering.
- Background operating or building large-scale AI or GPU-accelerated infrastructure.
- Experience maintaining multi-year ownership of foundational production systems.
The base salary range for this role is $182,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
What We Offer
The range we've posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.
In addition to a competitive salary, we offer a variety of benefits to support your needs. The benefits below reflect our US-based offerings; for roles in other locations, benefits vary and are shared during the hiring process. These include:
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Ability to Participate in Employee Stock Purchase Program (ESPP)
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption
California Applicants
California Consumer Privacy Act
Equal Opportunity & Accommodations
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
As part of this commitment and consistent with the Americans with Disabilities Act (ADA) , CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: View email address on ziprecruiter.com.
Export Control Compliance
This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.
$145k - $165k
A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key...Senior- ...Description Job Description What You’ll Be Doing: Automotive Reliability & Qualification (≈50%) Own automotive reliability and... ...You Have: ~ BS or MS in Electrical, Mechanical, or Systems Engineering ~5+ years of automotive hardware validation experience ~...Senior
$300 per month
...each other, come build with us at Crusoe. About This Role: Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platform — and Production Engineering sits at the heart of that mission. As a Production Engineer focused on Operational...SeniorTemporary work- ...Senior PCB Layout Automation Engineer This role has been designed as “Onsite” with an expectation that you will... ...development and assessment. Evaluates reliability of materials, properties, designs, and techniques used in production. May direct support personnel in the...SeniorWork at officeLocal area
$125.8k - $170.2k
...A technology firm specializing in autonomous vehicle solutions is seeking a Senior Automotive Reliability & EMC Test Engineer to lead validation and compliance testing for LiDAR systems. The role involves defining testing strategies, leading environmental tests, and ensuring...Senior$139k - $242k
...Senior Security Production Engineer Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA / San Francisco, CA CoreWeave is The Essential... ...underpins our AI cloud platform. This team ensures the reliability, performance, and resilience of security systems across...SeniorPermanent employmentTemporary workCasual workWork at officeFlexible hours- ...Senior Engineer, Photonics IC Automation And Test Design Coherent Corp is the global leader... ...next-generation optical communication products. This role focuses on developing and... ..., automation, throughput, and product reliability for high-volume photonic and opto-electronic...SeniorFull timeWorldwide
- A leading innovative technology firm in Mountain View is seeking a Senior Automotive Reliability & EMC Test Engineer to validate automotive LiDAR systems. The successful candidate will own the testing strategy and define DV/PV test plans. A minimum of 5 years of automotive...Senior
- A leading tech company in Cupertino is seeking a Senior Quality Engineer for Apple Maps. The role emphasizes building automation frameworks and ensuring high-quality, reliable products across platforms. Candidates should have over 6 years in Quality Engineering, a strong...Senior
$272k - $431.25k
...As a Senior Engineering Manager for Agentic Systems & Platform Architecture, you will lead the... ...mechanisms that accelerate developer productivity and agent quality. If you’re... ...business impact (cycle time, quality, cost, reliability) Built and scaled an agent platform or...Senior- NVIDIA Corporation is seeking an EDA License Engineer in Santa Clara, California. This role involves managing and optimizing engineering... ..., ensuring compliance, and improving license infrastructure reliability through automation. Candidates should have strong FlexLM...
$167k - $230k
Reliable Robotics is seeking a Sr. Manufacturing Engineer in Mountain View, CA, to improve manufacturing processes and ensure product quality. You'll be responsible for the installation, maintenance, and compliance of aircraft systems while leading complex projects. Candidates...Senior- Google Inc. in Sunnyvale, CA is seeking a Senior Supplier Quality Engineer specialized in thermal technologies. The ideal candidate will have... ...Engineering practices, data-driven decision-making for product reliability, and managing supplier relationships for quality...Senior
- ...NVIDIA Gruppe is seeking experienced Senior Software Engineers to join their production engineering team in Santa Clara, California. The role involves building... ...for GPU clusters, with a focus on Kubernetes and reliability practices. The ideal candidate will have over 8...Senior
- Synopsys, Inc. is seeking an experienced C++ engineer to design and develop features in RedHawk-SC, a platform for power integrity analysis... ...degree, you will make a significant impact on the quality and reliability of semiconductor designs, collaborating closely within a...Senior
$174k - $252k
Google Inc. in Sunnyvale, CA is seeking a Senior Software Engineer for Machine Health. This role focuses on managing the life-cycle of machines in the data center, ensuring they operate efficiently and reliably. Candidates should possess a Bachelor’s degree and have significant...Senior$184k - $287.5k
...Within NVIDIA, our Financial Services Engineering (FSE) group’s mission is to architect the... ...across the globe. We are looking for a Senior Data Engineer to join our Financial... ...debugging complex challenges to ensure the reliability of financial operations. What We Need...Senior$188k - $250k
...A leading technology company in aviation seeks a Sr. Electrical Test Engineer in Mountain View, CA. This role involves developing automated test systems to ensure the reliability of avionics for unmanned aircraft. Candidates should have a Bachelor's degree in electrical...Senior- Applied Materials, Inc. is seeking a Reliability Analytics Engineer IV in Santa Clara, CA. You will design reliability tests for electronic components, evaluate suppliers, and improve documentation processes. The ideal candidate will have over 10 years in electronics,...Senior
$184k - $287.5k
...operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and safe to run. This role is part of a...Senior$140k - $175k
...Are you passionate about crafting products that users and team members love to engage... ...best practices Improve test coverage, reliability and execution speed Integrate automation... ...and safer releases Partner with engineering, product and QA teams to deliver high-quality...SeniorPermanent employment$120k - $200k
...CoStar Group, Inc. is seeking a Senior Manufacturing Test Engineer to lead software, automation, and data systems for their manufacturing test infrastructure... ...architectures, automating test processes, and analyzing production data to enhance yield. The ideal candidate will have...Senior$140k - $175k
...Opportunity Are you passionate about crafting products that users and team members love to... ...CI/CD environment, ensuring stable and reliable test execution. Continuously evaluate and... ...of experience as an Sr SDET or Software Engineer, focusing on web application testing and...Senior- ...part of a core team that ensures safe, reliable, and scalable releases of the Autonomous... ...of AV releases by unifying software engineering, reliability analysis, and release automation... ...software and system issues impacting production readiness. If you are passionate about...Local areaWork from home
$170.7k - $300.2k
...technology firm based in Cupertino is looking for a Service Reliability Engineer to design, develop, and maintain their iCloud content... ...tools like Splunk and Grafana. The role includes operating production environments and automating deployment processes with a competitive...$110k - $170k
...A tech company specializing in optical systems is seeking a Photonics Systems Test Engineer to manage system testing and characterization of their optical product line. The ideal candidate will have extensive experience in optical and systems testing, with strong knowledge...Senior- ...Job Description Our client is a world leader in Consumer Electronics products & services, is looking for Senior System Test Automation Engineer . Kindly see the details below and send us your updated resume. Job Title : Sr. System Test Automation Engineer...SeniorLong term contractFull time
$172.1k - $258.6k
Apple Inc. is seeking an Operations Test Engineering professional in Cupertino, California. The ideal candidate will have over 7 years of... ...for testability and optimizing test solutions throughout the product lifecycle. This role offers a competitive salary ranging from...Senior- Google Inc. is hiring a Test Engineer in Mountain View, CA, focusing on enhancing the quality of products. The role involves automating testing processes, developing test plans, and working closely with diverse teams to improve product and engineering health. Ideal candidates...Senior
$168k - $270.25k
...We are seeking an experienced Senior QA Automation Engineer to join our Network AI platform team. This role combines manual testing expertise with... ...Python-based automation skills to ensure the quality and reliability of our innovative AI/ML‑powered complex NVIDIA’s network...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Production Engineer (Reliability). Be the first to apply!
- application operations engineer Sunnyvale, CA
- operations engineer Sunnyvale, CA
- production operations engineer Sunnyvale, CA
- operations quality engineer Sunnyvale, CA
- senior security operations engineer Sunnyvale, CA
- post production engineer Sunnyvale, CA
- production network engineer Sunnyvale, CA
- data center operations engineer Sunnyvale, CA
- network operations center engineer Sunnyvale, CA
- security operations center engineer Sunnyvale, CA


