Senior Production Engineer, Operational Excellence
$300 per monthCrusoe
Job Description
Job Description
Crusoe is on a mission to accelerate the abundance of energy and intelligence . As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
About This Role:Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platform — and Production Engineering sits at the heart of that mission. As a Production Engineer focused on Operational Excellence, you will help ensure the reliability, scalability, and performance of Crusoe’s GPU cloud that powers next-generation AI workloads.
This role is ideal for engineers who enjoy solving complex production problems, improving large-scale distributed systems, and building automation that keeps infrastructure running smoothly. You’ll play a key role in strengthening the operational foundation of Crusoe’s cloud while helping scale infrastructure that supports demanding AI and HPC workloads.
You’ll partner closely with Production Engineers, infrastructure teams, and platform engineers to improve system reliability, reduce operational toil, and drive continuous improvements across Crusoe’s rapidly growing GPU cloud.
What You’ll Be Working On:Collaborate with cross-functional teams to define and evolve availability metrics for Crusoe’s cloud platform, including establishing, measuring, and improving SLIs and SLOs
Participate in production incident response, diagnosing and resolving service disruptions while contributing to post-incident reviews and root cause analysis
Build, operate, and improve observability across Crusoe’s infrastructure using tools such as Prometheus, Grafana, Alertmanager, and OpenTelemetry
Identify reliability risks, performance bottlenecks, and early indicators of potential production issues across distributed systems
Develop automation and tooling that reduces operational toil, improves recovery times, and enables self-healing infrastructure
Partner with compute, networking, storage, and platform teams to strengthen service resilience and disaster recovery capabilities
Contribute to improving operational processes, knowledge sharing, and reliability best practices across the engineering organization
Continue growing technical depth through mentorship, training, and hands-on work operating large-scale AI infrastructure
5+ years of experience in Production Engineering, SRE, or large-scale infrastructure operations
Experience supporting GPU workloads, HPC environments, or latency/throughput-sensitive distributed systems
Strong knowledge of Linux/Unix systems, including debugging complex issues across kernel and user space
Previous experience in Infrastructure roles building or managing compute, storage or networking platforms
Understanding of modern cloud infrastructure fundamentals including Kubernetes, distributed systems, virtualization, and cloud platforms (AWS/GCP)
Familiarity with incident management practices and reliability frameworks (SRE, ITIL, or similar)
Experience with monitoring and observability tools such as Prometheus and Grafana, or a strong desire to deepen expertise in this area
Familiarity with infrastructure-as-code and configuration management tools such as Terraform or Ansible
Scripting or programming experience with languages such as Go, Python, C, or C++
Strong communication skills and the ability to collaborate across engineering teams
Ability to remain calm and effective while troubleshooting complex issues in high-impact production environments
A growth mindset and strong interest in reliability engineering, automation, and operational excellence
Experience working with Kubernetes or container orchestration platforms at scale
Exposure to change management processes, operational readiness reviews, or structured root cause analysis
Experience designing self-healing systems, automated remediation, or event-driven operational tooling
Interest in scaling AI or HPC infrastructure and solving reliability challenges in GPU-heavy environments
Passion for mentorship, learning, and developing deeper expertise in Production Engineering
Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
MetLife Legal
Company paid commuter benefit; $300 per month
Compensation will be paid in the range of $172,000 – $209,000 + Bonus. Restricted Stock Units are included in all offers. Compensation will be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
$109.34k - $164.01k
...continually set new benchmarks for excellence, reliability and... ...Advanced Materials and Joining - Engineering polymer and metal joining solutions... ...systems to support operational excellence. The role owns the... ...chemical process engineers, production, EHS, and R&D teams to drive...SeniorLocal areaWorldwide- ...Crusoe is seeking a Senior Staff Network Operations Engineer to ensure the reliability of their global network... ...This role focuses on operational excellence, driving incident responses, and mentoring... ...over 12 years of experience in production network engineering, extensive...Senior
- Hornblower Corp is seeking a Chief Engineer in San Francisco to manage engineering operations for vessels. Responsibilities include performing repairs, overseeing... ...skills. The ideal candidate will demonstrate excellent communication abilities and a detailed understanding...Senior
$192k - $240k
...Security Operations Engineer Brex is the intelligent finance platform that enables companies to spend smarter and move faster in more... ...tackle hard technical problems, own our outcomes, and push for excellence at every level — from architecture to deployment. It's an...SeniorWork experience placementWork at officeRemote workWork from home- ...infrastructure company in San Francisco is seeking a Senior Manager, Business Process to design and... ...across various departments to enhance operations and scalability. The ideal candidate... ...2 years of experience in a senior role, excel in analytical skills, and possess a CPA....Senior
$106.8k - $194.8k
...Join EY and help to build a better working world. WAF Operations Solution Engineer PRACTICE DESCRIPTION: As a WAF Operations Solution... ...certifications (e.g., CISSP, Security+) are a plus. ~ Excellent analytical, problem-solving, and communication skills....SeniorSummer holidayFlexible hours$151.3k - $271.15k
## Senior Manager - Platform Engineering and OperationsApplylocations: San Francisco,... ...Senior Manager, Intelligent Operations & Platform Engineering... ...will drive operational excellence across multi-cloud infrastructure... ..., enhance developer productivity, and build a highly...Senior$126k - $180k
...reliable, and secure crypto products and services to individuals and... .... Customer Support (Ledger Operations) As a team within the Support... ...closely with data scientists, engineers, product managers, and corporate... ...reconciliation processes. Senior Ledger Operations Engineer This...SeniorWork at officeRemote workFlexible hours- HackerOne Inc. is seeking a Senior GTM Operations Engineer to modernize its revenue engine using AI. You will partner with Marketing, Sales, and other departments to redesign and implement efficient operations. This remote role targets individuals within approximately...SeniorRemote job
- HackerOne seeks a Senior GTM Operations Engineer to modernize the revenue engine with AI-native capabilities. You will collaborate closely with stakeholders across Marketing, Sales, and Finance, tackling operational challenges through innovative design and automation....SeniorRemote job
- ...outages, and ensure the grid operates efficiently. The company is backed... ...Overview The Hardware Engineering team is responsible for... ...deserts. Role Description The Senior Firmware QA Engineer will, as... ...agile development processes. ~ Excellent communication skills, with...Senior
$153k - $187k
...We Default to Disclosure by operating with transparency and... ...respect, and accountability. Senior GTM Operations EngineerRemote... ...re-architecting the revenue engine, leading the build of an AI-native... ...Experience operating in AI-native or product-led growth environments...SeniorApprenticeshipLocal areaRemote workFlexible hoursShift work- ...Venture Post is looking for a Solutions Engineer in San Francisco to be a key... ...prospects through technical discovery and product evaluations, ensuring n8n operates securely and reliably in their... ...of Sales Engineering experience, excellent communication skills, and a deep...Senior
- ...Partner Operations Senior Engineer San Francisco, CA Location: In office, San Francisco (HQ) Experience: 6-9 years Reports To: Director... ...Solutions Focus: Own infrastructure, pipelines, and production apps supporting our partner ecosystem We are looking for...SeniorFull timeWork at officeFlexible hours
$187k - $260k
...pioneer of the Connected Operations™ Cloud, which is a... ...an exciting array of product solutions, including Video... ...Samsara's Automation Engineering team uses cutting-edge... ...to drive product excellence, ensuring our customers... ...looking for a self-driven Senior Automation Engineer to...SeniorFull timeWork at officeRemote workFlexible hours3 days per week$118k - $169k
...the heart of what we do. Our Machine Learning Operations team enables our Data Scientists to be able... ...processes that take our models from ideas to production models, serving predictions in real time. The Sr. ML Ops Engineer will partner with our Data Science, Data Product...SeniorHourly payWork experience placementWork at officeImmediate startVisa sponsorshipWork visaFlexible hours$170k - $190k
Mxv in San Francisco is seeking an experienced Senior Automation and Tools Engineer. In this hands-on role, you will scale the platform with a focus on reliability, observability, and operational excellence. Key responsibilities include automating cloud infrastructure,...Senior$165.5k - $289.6k
...California in 2004 when a visionary engineer, Fred Luddy, saw the potential to... ..., keep reading. The Senior Manager, PMO Excellence and Launch Operations, partners with the Launch Office... ...seamless execution across ServiceNow's product launches and other ad hoc global...SeniorWork at officeRemote workFlexible hours- HackerOne is looking for a Senior GTM Operations Engineer to modernize revenue operations and build an AI-native GTM model. The role involves collaborating with various stakeholders to diagnose operational challenges and implementing solutions that enhance workflow efficiency...SeniorRemote work
$179k - $218k
...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture Crusoe is on a mission to accelerate the abundance of energy and intelligence... ...point for the most complex hardware failures in the production environment. Lead Root Cause Analysis (RCA) on systemic...SeniorTemporary work$174k - $215k
...culture that inspires growth, accountability, and excellence. Backed by leading investors like Google... ...how our Sales, Marketing, and CX organizations operate — and we're growing it. This is a net new role for a senior engineer who wants breadth and ownership in equal...SeniorFull timeFor contractorsWork at officeLocal areaFlexible hours$71.33k - $122.28k
Senior Building Automation Service Specialist Job ID: 507324 Posted... ...and standard service operations strategies. Collaborate with... ...and networking. Knowledge of engineering concepts, programming, job start... ...with Microsoft Office (Word, Excel, Outlook). Must be 18+ years...SeniorFor contractorsWork at officeLocal areaNight shift- At jobr.pro, we’re seeking a production-grade AI workflows designer to join our GTM Systems team in San Francisco. This role involves... ...You will partner with cross-functional teams to enhance the operational efficiency of B2B customer relationships. Ideal candidates have...SeniorFull timeContract work
- ...small, dynamic team of engineers, scientists, and... ...an AI-native DevOps / Operations Engineer to help build... ...can design and harden production systems, improve CI/CD... ...specializes As the senior engineer on-site, you... ...operations and operational excellence for this function Lead...SeniorRemote work3 days per week
$182k - $242k
...Nasdaq: CRWV) in March 2025. Learn more at About the Role Production Engineering ensures CoreWeave’s cloud delivers world-class reliability, performance, and operational excellence. We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical...SeniorPermanent employmentFull timeTemporary workCasual workWork at officeFlexible hours$175.3k - $280.47k
...Description Senior Director, Operational Performance & Excellence The Center for Elders' Independence is a PACE (Program of All-Inclusive Care for... ...effectiveness, workflow consistency, timeliness performance, productivity management, service delivery accountability, and...SeniorLive in$140k - $180k
...cutting-edge robotics startup in San Francisco is seeking a Senior Robotics Test Engineer to lead the reliability and precision of robotics... ...software test engineering, particularly in robotics, and excellent communication skills. A competitive salary range of $140,...Senior- ...Senior Cloud Data Operations Engineer Responsibilities Support/Operate an Enterprise Data Services Platform (RedShift/EMR/OpenSearch Service... ...implement workarounds/solutions in all environments (production/non-production) Identifying and implementing cost-saving...SeniorLong term contractContract workRemote work
- ...work from home day is currently Tuesday. Engineering at Lambda is responsible for building... ...nodes) Remotely install and configure operating systems, firmware, software, and networking... ...for changes to project timelines Have excellent problem solving and troubleshooting...Work experience placementWork at officeLocal areaRemote workWork from homeFlexible hours
$71.33k - $122.28k
...Siemens Mobility in Fremont, CA, is seeking a Senior Building Automation Service Specialist responsible for executing standard service operations strategies and providing responsive service. Ideal candidates will have over 3 years of experience in automation systems maintenance...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Production Engineer, Operational Excellence. Be the first to apply!
- application operations engineer San Francisco, CA
- operations engineer intern San Francisco, CA
- operations engineer San Francisco, CA
- production operations engineer San Francisco, CA
- remote operation drilling engineer San Francisco, CA
- operations quality engineer San Francisco, CA
- senior security operations engineer San Francisco, CA
- post production engineer San Francisco, CA
- production network engineer San Francisco, CA
- data center operations engineer San Francisco, CA


