Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Production Engineer (Reliability)

$182k - $242k

CoreWeave

Job Description

Job Description

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at

About the Role

Production Engineering ensures CoreWeave's cloud delivers world-class reliability, performance, and operational excellence. We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical tooling that drives reliability and delivery success.

In this role, you will work broadly across the cloud stack designing, implementing, deploying, and operating systems that improve delivery velocity, service availability, and operational safety. You'll be responsible for leading end-to-end technical projects, maintaining long-lived systems the team owns, and strengthening our operational foundations through durable engineering investments.

This is a role for someone who enjoys building , debugging , and operating production systems. You will collaborate closely with service owners, but your primary impact comes from the reliability, quality, and maturity of the systems you deliver and maintain over time.

What You'll Do
  • Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.
  • Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery.
  • Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support.
  • Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes.
  • Improve runbooks, sources of truth, deployment workflows, and operational tooling to harden production readiness.
  • Eliminate single points of failure and reduce operational toil through automation, refactors, and system redesigns.
  • Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.
  • Maintain and mature long-term projects and frameworks owned by the team, ensuring they remain reliable, well-instrumented, and easy to operate.
  • Collaborate with platform teams to ensure new features and services integrate cleanly with our reliability best-practices and tooling.
What You've Worked On (Minimum Qualifications)
  • 7+ years of engineering experience building and operating distributed systems or cloud platforms.
  • Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation.
  • Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools.
  • Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes.
  • Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices.
  • A track record of successfully delivering hands-on reliability improvements through engineering execution.
Preferred Qualifications
  • Experience building internal tooling, frameworks, or automation that supports high-availability cloud operations.
  • Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering.
  • Background operating or building large-scale AI or GPU-accelerated infrastructure.
  • Experience maintaining multi-year ownership of foundational production systems.

The base salary range for this role is $182,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).

What We Offer

The range we've posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.

In addition to a competitive salary, we offer a variety of benefits to support your needs. The benefits below reflect our US-based offerings; for roles in other locations, benefits vary and are shared during the hiring process. These include:

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption

California Applicants

California Consumer Privacy Act

Equal Opportunity & Accommodations

CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.

As part of this commitment and consistent with the Americans with Disabilities Act (ADA) , CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: View email address on ziprecruiter.com.

Export Control Compliance

This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Production Engineer (Reliability) in Sunnyvale, CA vacancy
  • $145k - $165k

    A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key... 
    Senior

    Bolt Graphics, Inc.

    Sunnyvale, CA
    1 day ago
  •  ...Description Job Description What You’ll Be Doing: Automotive Reliability & Qualification (≈50%) Own automotive reliability and...  ...You Have: ~ BS or MS in Electrical, Mechanical, or Systems Engineering ~5+ years of automotive hardware validation experience ~... 
    Senior

    Aeva, Inc.

    Mountain View, CA
    24 days ago
  • $300 per month

     ...each other, come build with us at Crusoe. About This Role: Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platform — and Production Engineering sits at the heart of that mission. As a Production Engineer focused on Operational... 
    Senior
    Temporary work

    Crusoe

    Sunnyvale, CA
    13 days ago
  •  ...Senior PCB Layout Automation Engineer This role has been designed as “Onsite” with an expectation that you will...  ...development and assessment. Evaluates reliability of materials, properties, designs, and techniques used in production. May direct support personnel in the... 
    Senior
    Work at office
    Local area

    Hobbsnews

    Sunnyvale, CA
    3 days ago
  • $125.8k - $170.2k

     ...A technology firm specializing in autonomous vehicle solutions is seeking a Senior Automotive Reliability & EMC Test Engineer to lead validation and compliance testing for LiDAR systems. The role involves defining testing strategies, leading environmental tests, and ensuring... 
    Senior

    Aeva, Inc

    Mountain View, CA
    3 days ago
  • $139k - $242k

     ...Senior Security Production Engineer Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA / San Francisco, CA CoreWeave is The Essential...  ...underpins our AI cloud platform. This team ensures the reliability, performance, and resilience of security systems across... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    3 days ago
  •  ...Senior Engineer, Photonics IC Automation And Test Design Coherent Corp is the global leader...  ...next-generation optical communication products. This role focuses on developing and...  ..., automation, throughput, and product reliability for high-volume photonic and opto-electronic... 
    Senior
    Full time
    Worldwide

    Coherent

    Santa Clara, CA
    1 day ago
  • A leading innovative technology firm in Mountain View is seeking a Senior Automotive Reliability & EMC Test Engineer to validate automotive LiDAR systems. The successful candidate will own the testing strategy and define DV/PV test plans. A minimum of 5 years of automotive... 
    Senior

    Aeva Inc.

    Mountain View, CA
    2 days ago
  • A leading tech company in Cupertino is seeking a Senior Quality Engineer for Apple Maps. The role emphasizes building automation frameworks and ensuring high-quality, reliable products across platforms. Candidates should have over 6 years in Quality Engineering, a strong... 
    Senior

    Apple Inc.

    Cupertino, CA
    22 hours ago
  • $272k - $431.25k

     ...As a Senior Engineering Manager for Agentic Systems & Platform Architecture, you will lead the...  ...mechanisms that accelerate developer productivity and agent quality. If you’re...  ...business impact (cycle time, quality, cost, reliability) Built and scaled an agent platform or... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • NVIDIA Corporation is seeking an EDA License Engineer in Santa Clara, California. This role involves managing and optimizing engineering...  ..., ensuring compliance, and improving license infrastructure reliability through automation. Candidates should have strong FlexLM... 

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $167k - $230k

    Reliable Robotics is seeking a Sr. Manufacturing Engineer in Mountain View, CA, to improve manufacturing processes and ensure product quality. You'll be responsible for the installation, maintenance, and compliance of aircraft systems while leading complex projects. Candidates... 
    Senior

    Reliable Robotics

    Mountain View, CA
    4 days ago
  • Google Inc. in Sunnyvale, CA is seeking a Senior Supplier Quality Engineer specialized in thermal technologies. The ideal candidate will have...  ...Engineering practices, data-driven decision-making for product reliability, and managing supplier relationships for quality... 
    Senior

    Google Inc.

    Sunnyvale, CA
    22 hours ago
  •  ...NVIDIA Gruppe is seeking experienced Senior Software Engineers to join their production engineering team in Santa Clara, California. The role involves building...  ...for GPU clusters, with a focus on Kubernetes and reliability practices. The ideal candidate will have over 8... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • Synopsys, Inc. is seeking an experienced C++ engineer to design and develop features in RedHawk-SC, a platform for power integrity analysis...  ...degree, you will make a significant impact on the quality and reliability of semiconductor designs, collaborating closely within a... 
    Senior

    Synopsys, Inc.

    Sunnyvale, CA
    2 days ago
  • $174k - $252k

    Google Inc. in Sunnyvale, CA is seeking a Senior Software Engineer for Machine Health. This role focuses on managing the life-cycle of machines in the data center, ensuring they operate efficiently and reliably. Candidates should possess a Bachelor’s degree and have significant... 
    Senior

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $184k - $287.5k

     ...Within NVIDIA, our Financial Services Engineering (FSE) group’s mission is to architect the...  ...across the globe. We are looking for a Senior Data Engineer to join our Financial...  ...debugging complex challenges to ensure the reliability of financial operations. What We Need... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $188k - $250k

     ...A leading technology company in aviation seeks a Sr. Electrical Test Engineer in Mountain View, CA. This role involves developing automated test systems to ensure the reliability of avionics for unmanned aircraft. Candidates should have a Bachelor's degree in electrical... 
    Senior

    Reliable Robotics Corporation

    Mountain View, CA
    3 days ago
  • Applied Materials, Inc. is seeking a Reliability Analytics Engineer IV in Santa Clara, CA. You will design reliability tests for electronic components, evaluate suppliers, and improve documentation processes. The ideal candidate will have over 10 years in electronics,... 
    Senior

    Applied Materials, Inc.

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

     ...operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and safe to run. This role is part of a... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $140k - $175k

     ...Are you passionate about crafting products that users and team members love to engage...  ...best practices Improve test coverage, reliability and execution speed Integrate automation...  ...and safer releases Partner with engineering, product and QA teams to deliver high-quality... 
    Senior
    Permanent employment

    Otter.ai

    Mountain View, CA
    4 days ago
  • $120k - $200k

     ...CoStar Group, Inc. is seeking a Senior Manufacturing Test Engineer to lead software, automation, and data systems for their manufacturing test infrastructure...  ...architectures, automating test processes, and analyzing production data to enhance yield. The ideal candidate will have... 
    Senior

    CoStar Group

    Sunnyvale, CA
    4 days ago
  • $140k - $175k

     ...Opportunity Are you passionate about crafting products that users and team members love to...  ...CI/CD environment, ensuring stable and reliable test execution. Continuously evaluate and...  ...of experience as an Sr SDET or Software Engineer, focusing on web application testing and... 
    Senior

    Cacheflow

    Mountain View, CA
    22 hours ago
  •  ...part of a core team that ensures safe, reliable, and scalable releases of the Autonomous...  ...of AV releases by unifying software engineering, reliability analysis, and release automation...  ...software and system issues impacting production readiness. If you are passionate about... 
    Local area
    Work from home

    Israelvcforum

    Sunnyvale, CA
    1 day ago
  • $170.7k - $300.2k

     ...technology firm based in Cupertino is looking for a Service Reliability Engineer to design, develop, and maintain their iCloud content...  ...tools like Splunk and Grafana. The role includes operating production environments and automating deployment processes with a competitive... 

    Career-Mover

    Cupertino, CA
    4 days ago
  • $110k - $170k

     ...A tech company specializing in optical systems is seeking a Photonics Systems Test Engineer to manage system testing and characterization of their optical product line. The ideal candidate will have extensive experience in optical and systems testing, with strong knowledge... 
    Senior

    nEye Systems, Inc.

    Santa Clara, CA
    3 days ago
  •  ...Job Description Our client is a world leader in Consumer Electronics products & services, is looking for Senior System Test Automation Engineer . Kindly see the details below and send us your updated resume. Job Title : Sr. System Test Automation Engineer... 
    Senior
    Long term contract
    Full time

    Dawar Consulting

    Sunnyvale, CA
    13 days ago
  • $172.1k - $258.6k

    Apple Inc. is seeking an Operations Test Engineering professional in Cupertino, California. The ideal candidate will have over 7 years of...  ...for testability and optimizing test solutions throughout the product lifecycle. This role offers a competitive salary ranging from... 
    Senior

    Apple Inc.

    Cupertino, CA
    4 days ago
  • Google Inc. is hiring a Test Engineer in Mountain View, CA, focusing on enhancing the quality of products. The role involves automating testing processes, developing test plans, and working closely with diverse teams to improve product and engineering health. Ideal candidates... 
    Senior

    Google Inc.

    Mountain View, CA
    4 days ago
  • $168k - $270.25k

     ...We are seeking an experienced Senior QA Automation Engineer to join our Network AI platform team. This role combines manual testing expertise with...  ...Python-based automation skills to ensure the quality and reliability of our innovative AI/ML‑powered complex NVIDIA’s network... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Production Engineer (Reliability). Be the first to apply!