Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Production Engineer, Operational Excellence

$300 per month

Crusoe

Job Description

Job Description

Crusoe is on a mission to accelerate the abundance of energy and intelligence . As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.

We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.

We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.

If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

About This Role:

Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platform — and Production Engineering sits at the heart of that mission. As a Production Engineer focused on Operational Excellence, you will help ensure the reliability, scalability, and performance of Crusoe’s GPU cloud that powers next-generation AI workloads.

This role is ideal for engineers who enjoy solving complex production problems, improving large-scale distributed systems, and building automation that keeps infrastructure running smoothly. You’ll play a key role in strengthening the operational foundation of Crusoe’s cloud while helping scale infrastructure that supports demanding AI and HPC workloads.

You’ll partner closely with Production Engineers, infrastructure teams, and platform engineers to improve system reliability, reduce operational toil, and drive continuous improvements across Crusoe’s rapidly growing GPU cloud.

What You’ll Be Working On:
  • Collaborate with cross-functional teams to define and evolve availability metrics for Crusoe’s cloud platform, including establishing, measuring, and improving SLIs and SLOs

  • Participate in production incident response, diagnosing and resolving service disruptions while contributing to post-incident reviews and root cause analysis

  • Build, operate, and improve observability across Crusoe’s infrastructure using tools such as Prometheus, Grafana, Alertmanager, and OpenTelemetry

  • Identify reliability risks, performance bottlenecks, and early indicators of potential production issues across distributed systems

  • Develop automation and tooling that reduces operational toil, improves recovery times, and enables self-healing infrastructure

  • Partner with compute, networking, storage, and platform teams to strengthen service resilience and disaster recovery capabilities

  • Contribute to improving operational processes, knowledge sharing, and reliability best practices across the engineering organization

  • Continue growing technical depth through mentorship, training, and hands-on work operating large-scale AI infrastructure

What You’ll Bring to the Team:
  • 5+ years of experience in Production Engineering, SRE, or large-scale infrastructure operations

  • Experience supporting GPU workloads, HPC environments, or latency/throughput-sensitive distributed systems

  • Strong knowledge of Linux/Unix systems, including debugging complex issues across kernel and user space

  • Previous experience in Infrastructure roles building or managing compute, storage or networking platforms

  • Understanding of modern cloud infrastructure fundamentals including Kubernetes, distributed systems, virtualization, and cloud platforms (AWS/GCP)

  • Familiarity with incident management practices and reliability frameworks (SRE, ITIL, or similar)

  • Experience with monitoring and observability tools such as Prometheus and Grafana, or a strong desire to deepen expertise in this area

  • Familiarity with infrastructure-as-code and configuration management tools such as Terraform or Ansible

  • Scripting or programming experience with languages such as Go, Python, C, or C++

  • Strong communication skills and the ability to collaborate across engineering teams

  • Ability to remain calm and effective while troubleshooting complex issues in high-impact production environments

  • A growth mindset and strong interest in reliability engineering, automation, and operational excellence

Bonus Points:
  • Experience working with Kubernetes or container orchestration platforms at scale

  • Exposure to change management processes, operational readiness reviews, or structured root cause analysis

  • Experience designing self-healing systems, automated remediation, or event-driven operational tooling

  • Interest in scaling AI or HPC infrastructure and solving reliability challenges in GPU-heavy environments

  • Passion for mentorship, learning, and developing deeper expertise in Production Engineering

Benefits:
  • Industry competitive pay

  • Restricted Stock Units in a fast growing, well-funded technology company

  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

  • Employer contributions to HSA accounts

  • Paid Parental Leave

  • Paid life insurance, short-term and long-term disability

  • Teladoc

  • 401(k) with a 100% match up to 4% of salary

  • Generous paid time off and holiday schedule

  • Cell phone reimbursement

  • Tuition reimbursement

  • Subscription to the Calm app

  • MetLife Legal

  • Company paid commuter benefit; $300 per month

Compensation:

Compensation will be paid in the range of $172,000 – $209,000 + Bonus. Restricted Stock Units are included in all offers. Compensation will be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Vacancy posted 23 days ago
Similar jobs that could be interesting for youBased on the Senior Production Engineer, Operational Excellence in San Francisco, CA vacancy
  • $300 per month

     ...built from the ground up, we own and operate each layer of the stack — from electrons...  ..., AI-optimized cloud platform — and Production Engineering sits at the heart of that mission. As...  ...Production Engineer focused on Operational Excellence, you will help ensure the reliability,... 
    Senior
    Full time
    Temporary work

    Crusoe

    San Francisco, CA
    16 days ago
  • Hornblower Corp is seeking a Chief Engineer in San Francisco to manage engineering operations for vessels. Responsibilities include performing repairs, overseeing...  ...skills. The ideal candidate will demonstrate excellent communication abilities and a detailed understanding... 
    Senior

    Hornblower Corp

    San Francisco, CA
    5 days ago
  • $192k - $240k

     ...Security Operations Engineer Brex is the intelligent finance platform that enables companies to spend smarter and move faster in more...  ...tackle hard technical problems, own our outcomes, and push for excellence at every level — from architecture to deployment. It's an... 
    Senior
    Work experience placement
    Work at office
    Remote work
    Work from home

    Brex

    San Francisco, CA
    4 days ago
  • $134k - $205k

     ...Senior Security Operations Engineer Austin | Chicago | New York City | Salt Lake City | San Francisco...  ...join a company built on innovative products, ambitious goals, and passionate people...  ...and problem-solving skills. Excellent communication and teamwork abilities... 
    Senior
    Remote work
    Work from home
    Flexible hours
    Shift work
    Day shift

    Gong.io

    San Francisco, CA
    8 hours ago
  •  ...infrastructure company in San Francisco is seeking a Senior Manager, Business Process to design and...  ...across various departments to enhance operations and scalability. The ideal candidate...  ...2 years of experience in a senior role, excel in analytical skills, and possess a CPA.... 
    Senior

    Fluidstack

    San Francisco, CA
    2 days ago
  • $80 per hour

     ...the Role We are seeking a Senior Automation Anywhere...  ...improve efficiency and reduce operational costs. • Ensure automation...  ...workflow design principles. • Excellent problem-solving skills and ability...  ...market is a plus. Client Engineering Ubicaciones Bay Area, CA... 
    Senior
    Hourly pay
    Contract work
    For contractors
    Remote work

    Valce Talent Solutions

    San Francisco, CA
    8 hours ago
  • $126k - $180k

     ...reliable, and secure crypto products and services to individuals and...  .... Customer Support (Ledger Operations) As a team within the Support...  ...closely with data scientists, engineers, product managers, and...  ...internal reconciliation processes. Senior Ledger Operations Engineer... 
    Senior
    Work at office
    Remote work
    Flexible hours

    Skydrop

    San Francisco, CA
    2 days ago
  • $165.5k - $289.6k

     ...California in 2004 when a visionary engineer, Fred Luddy, saw the potential to...  ...Tuesday, keep reading. The Senior Manager, PMO Excellence and Launch Operations, partners with the Launch Office...  ...seamless execution across ServiceNow's product launches and other ad hoc global... 
    Senior
    Work at office
    Remote work
    Flexible hours

    ServiceNow

    San Francisco, CA
    2 days ago
  • $100k - $137k

     ...IT Operations Automation Engineer San Francisco, CA Faire is a technology wholesale platform built...  ...— we help them discover the best products from around the world to sell in their...  ...something using an LLM in a workflow. ~ Excellent communication; able to clearly... 
    Work experience placement
    Work at office
    Local area
    Remote work
    Monday to Friday
    Flexible hours
    3 days per week

    Faire Inc

    San Francisco, CA
    16 hours ago
  • $118k - $169k

     ...the heart of what we do. Our Machine Learning Operations team enables our Data Scientists to be able...  ...processes that take our models from ideas to production models, serving predictions in real time. The Sr. ML Ops Engineer will partner with our Data Science, Data Product... 
    Senior
    Hourly pay
    Work experience placement
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services, LLC

    San Francisco, CA
    8 hours ago
  •  ...outages, and ensure the grid operates efficiently. The company is backed...  ...Overview  The Hardware Engineering team is responsible for...  ...deserts.  Role Description The Senior Firmware QA Engineer will, as...  ...agile development processes.  ~ Excellent communication skills, with... 
    Senior

    Gridware

    San Francisco, CA
    3 days ago
  • $150k - $180k

     ...technology to make it possible. Our products power peak mental, physical...  ...of the Year.” We operate like a high-performance team...  ...looking for an experienced Senior Test Engineer to lead in development of software...  ...Innovation in a culture of excellence Join us in a workplace... 
    Senior
    Full time
    Work at office
    Immediate start
    Worldwide
    Flexible hours
    Night shift

    Eight Sleep

    San Francisco, CA
    9 days ago
  •  ...Senior Cloud Data Operations Engineer Responsibilities Support/Operate an Enterprise Data Services Platform (RedShift/EMR/OpenSearch Service...  ...implement workarounds/solutions in all environments (production/non-production) Identifying and implementing cost-saving... 
    Senior
    Long term contract
    Contract work
    Remote work

    LABINE AND ASSOCIATES, INC.

    San Francisco, CA
    1 day ago
  •  ...Automation (GIS data GIS Maps)Strong Python coding and automation experience. Experience in Database Automation including files(excel, text, csv) handling with automation. Basic knowledge on CICD AWS deployments. Excellent communication skills in English is required... 
    Senior

    Purple Drive

    San Francisco, CA
    8 hours ago
  • $187k - $260k

     ...pioneer of the Connected Operations™ Cloud, which is a...  ...an exciting array of product solutions, including Video...  ...Samsara’s Automation Engineering team uses cutting-edge...  ...to drive product excellence, ensuring our customers...  ...looking for a self-driven Senior Automation Engineer to... 
    Senior
    Full time
    Work at office
    Remote work
    Flexible hours
    3 days per week

    Samsara

    San Francisco, CA
    1 hour ago
  • $179k - $218k

     ...built from the ground up, we own and operate each layer of the stack — from...  ...must be bridged. We are seeking a Senior Staff Data Center Operations Engineer, GPU Hardware Architecture to be...  ...complex hardware failures in the production environment. Lead Root Cause Analysis... 
    Senior
    Temporary work

    Crusoe

    San Francisco, CA
    6 days ago
  •  ...Senior ML Operations Engineer We are seeking a Senior ML Operations Engineer with MEAN/MERN Stack, Search Optimization, Server Side Rendering...  ...Background with MEAN/MERN microservices as over 1,000 Microservices innovations are in production. Angular, Node.... 
    Senior

    Samprasoft

    Oakland, CA
    2 days ago
  • $140k - $180k

     ...cutting-edge robotics startup in San Francisco is seeking a Senior Robotics Test Engineer to lead the reliability and precision of robotics...  ...software test engineering, particularly in robotics, and excellent communication skills. A competitive salary range of $140,... 
    Senior

    Multiply Labs

    San Francisco, CA
    5 days ago
  •  ...W2 Systems Operations Engineer Locations: Charlotte (Preferred) / Dallas...  ...Application L1 Support - Senior Operations Engineer Duties...  ...high priority issues, production recoveries and provide support...  ...Incident Management experience Excellent analytical and problem-... 
    Work at office
    Immediate start
    Flexible hours
    Shift work
    Weekend work

    Syntricate Technologies

    San Francisco, CA
    1 day ago
  • Hornblower Group in San Francisco is seeking a Chief Engineer to oversee engineering and maintenance of vessels. The role requires a U.S. Coast Guard Licensed Chief Engineer with at least 10 years of marine experience, including 5 years in leadership. You will manage repairs... 
    Senior

    Hornblower Group

    San Francisco, CA
    5 days ago
  • $150k - $210k

     ...agencies. The Role Trail of Bits seeks an IT Operations Engineer to own and evolve our internal IT...  ...high accountability. You'll write production-quality code -- not just scripts that...  ...is a plus Communication & Organization Excellent written communication skills -- you'll... 
    Remote work
    Flexible hours

    GrabJobs

    San Francisco, CA
    4 days ago
  • $190k - $282k

     ...Senior Security Production Engineer Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA / San Francisco, CA CoreWeave is The Essential...  ...growing global footprint, enabling safe and efficient operations for enterprise and AI workloads at scale. About the role... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    San Francisco, CA
    1 day ago
  • Senior Launch Automation Engineer Darwin has partnered with a fast-paced startup in the Bay Area to find a Senior Launch Automation Engineer who will...  ...cryogenic, pneumatic, and fluid systems used in launch operations Build and own launch automation software for propellant... 
    Senior

    Darwin Recruitment

    San Francisco, CA
    3 days ago
  • $20 - $30 per hour

    The Role We’re looking for Robot Operations Engineer (Operator) who are passionate about robotics. This is a hands‑on role where you will operate...  ...Thrive in a fast paced environment. Bonus Qualifications: Excellent physical coordination, spatial awareness, and focus.... 
    Hourly pay
    Contract work
    Work at office

    Verne Robotics

    San Francisco, CA
    2 days ago
  • Salesforce, Inc. in San Francisco seeks a Production Support Engineer to enhance reliability and performance of the AI-powered Agentforce for...  ...You will lead infrastructure strategies, ensure production excellence, and collaborate with engineers on design reviews. A... 
    Senior

    Salesforce, Inc.

    San Francisco, CA
    4 days ago
  • $72.93 per hour

     ...standards of our industry. Our legacy is rooted in innovation and excellence, earning us a spot on Fast Company’s esteemed annual list...  ...grow and make your mark at Hines. Responsibilities As an Operations Engineer - Union with Hines, you will maintain basic operation and... 
    For contractors

    Hines

    San Francisco, CA
    4 days ago
  •  ...leading AI infrastructure company in New York is seeking a Senior Manager, People Operations to shape and scale their People function. This role...  ...jurisdictions, and leading a talented team committed to operational excellence. The ideal candidate will thrive in an ambitious setting... 
    Senior

    BaseTen

    San Francisco, CA
    2 days ago
  • $170k - $200k

     ...leading tech company in San Francisco is looking for a Senior Manager of Accounting Operations to lead core accounting processes and drive automation...  ...talented team. Competitive salary of $170,000 - $200,000 plus excellent benefits are offered for this role. #J-18808-Ljbffr... 
    Senior

    Kubelt

    San Francisco, CA
    4 days ago
  •  ...Senior Vice President, Retail Excellence About the Company Globally renowned luxury lifestyle brand Industry Luxury Goods & Jewelry Type...  ...environment at scale. The role requires the ability to operate at both strategic and operational levels within a global... 
    Senior
    Worldwide

    Confidential

    San Francisco, CA
    3 days ago
  •  ...tech-driven benefits company is seeking a Principal Product Manager for their Agentic Benefits Operations. This role involves owning the long-term strategy...  ...experience, a strong background in strategic thinking, and excellent communication skills. This position is based in San... 
    Senior

    Gusto

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Production Engineer, Operational Excellence. Be the first to apply!