Data Center Reliability Engineer
$119.2k - $163.9kPhaidra
Job Description
Job Description
About Phaidra
Phaidra is building the future of industrial automation.
The world today is filled with static, monolithic infrastructure. Factories, power plants, buildings, etc. operate the same they've operated for decades — because the controls programming is hard-coded. Thousands of lines of rules and heuristics that define how the machines interact with each other. The result of all this hard-coding is that facilities are frozen in time, unable to adapt to their environment while their performance slowly degrades.
Phaidra creates AI-powered control systems for the industrial sector, enabling industrial facilities to automatically learn and improve over time. Specifically:
- We use reinforcement learning algorithms to provide this intelligence, converting raw sensor data into high-value actions and decisions.
- We focus on industrial applications, which tend to be well-sensorized with measurable KPIs — perfect for reinforcement learning.
- We enable domain experts (our users) to configure the AI control systems (i.e. agents) without writing code. They define what they want their AI agents to do, and we do it for them.
Our team has a track record of applying AI to some of the toughest problems. From achieving superhuman performance with DeepMind's AlphaGo, to reducing the energy required to cool Google's Data Centers by 40%, we deeply understand AI and how to apply it in production for massive impact.
Phaidra's ability to achieve its mission is determined by our ability to work together — as defined by our core values: Transparency , Collaboration , Operational Excellence , Ownership , and Empathy. We seek individuals who embody these values, as they are instrumental in ensuring our team consistently delivers excellence and fosters an engaging and supportive culture
Phaidra is based in the USA, but we are 100% remote with no physical office. We hire employees internationally with the help of our partner, OysterHR. Our team is currently located throughout the USA, Canada, UK, Sweden, Spain, Portugal, the Netherlands, Singapore, Australia, and India.
Who You Are
As a Data Center Reliability Engineer on the Data Science team, you are the "bridge" between raw infrastructure telemetry and actionable operational intelligence. You don't just see numbers; you possess a deep mechanical and electrical empathy that allows you to read a system's data as a doctor reads a patient's chart. You are highly inquisitive, approaching complex anomalies without bias to uncover true root causes.
This is not a "hide behind the keyboard" role. You thrive in a hands-off environment, treating your space with a sense of proactive ownership and treating your peers as fellow experts. You are a truth-teller who uses thorough, compassionate communication to persuade others and drive impact in the high-stakes world of data center uptime. We expect our teammates to arrive ready to own their space and contribute to the team's collective success immediately.
**We are seeking a U.S.-based team member (Pacific Time Zone preferred) with flexibility to work hours that overlap with APAC time zones as needed.
Responsibilities
You will utilize our existing data ingestion and delivery platforms to "teach" our models to understand the physical world, filling a critical expertise gap in the data center industry.
- Multidisciplinary Diagnostic Analysis: Use telemetry tools to analyze sensor data across mechanical (chillers, pumps) and electrical (UPS, switchgear, power feeds) systems to identify "failure signatures" for our LLM-driven monitoring tool.
- Refining the Logic Engine: Act as a primary user of our platforms, identifying gaps in our current mechanisms and collaborating with Engineering to influence future features and data quality.
- Operational Insight Generation: Translate raw telemetry into the "SME-level" logic and directions used by our LLM tool to guide data center operators in real-time.
- SME Development: Cultivate deep domain expertise in all facets of data center infrastructure. You will be expected to master the nuances of both mechanical and electrical dependencies to ensure our product reflects operational reality.
- Customer Guidance: Move from shadowing peers to directly supporting customers, using our platform to provide clear, data-backed direction on complex problems.
- Model Validation: Oversee pilot projects to test how our AI-driven SME tool interprets real-world stressors, ensuring the output is operationally realistic, accurate, and actionable.
- Adaptability: Remain agile and proactive. As a member of a fast-moving team, you will encounter challenges and scopes not explicitly defined here; we expect you to lean in and solve them.
Key Qualifications
- Experience: 2–3 years of professional relevant experience
- Educational Background: Bachelor's degree in Mechanical Engineering, Electrical Engineering, Control Theory, or a related field that provides a foundation in physical systems and thermodynamics.
- Analytical Grit: A deep, innate interest in using data to diagnose how and why systems fail. You are a "tinkerer" who prefers solving real-world problems over theoretical research.
- Technical Proficiency: Strong Python skills and experience with data manipulation libraries (Pandas/NumPy) to perform custom analysis outside of standard tooling.
- Communication Mastery: Ability to explain complex diagnostic findings clearly and persuasively to both technical peers and non-domain stakeholders.
- Unbiased Problem-Solving: A proven ability to look at a problem without preconceived notions and figure out solutions either independently or via team collaboration.
- Alignment with Values: Demonstrated commitment to Transparency, Collaboration, and Ownership—especially in environments where reliability and learning from failure are paramount.
Preferred Skills & Experience
- Infrastructure Exposure: Experience with critical infrastructure components (HVAC, power distribution, or industrial automation).
- Industrial IoT: Experience with time-series data from industrial sensors (SCADA, BMS, Smart Meters).
- AI/LLM Curiosity: Exposure to or a strong interest in how LLMs can be used for root-cause analysis and automated reporting.
Onboarding
In your first 30 days…
- Familiarize yourself with the company handbook and team roadmap.
- Review existing system ontologies and sensor data structures across both mechanical and electrical domains.
- Shadow senior team members during customer diagnostic reviews to understand the "voice" of the SME.
In your first 60 days…
- Build full proficiency in our internal data tools and analysis workflows.
- Identify failure signatures in customer data with peer guidance and begin automating detection logic.
- Identify at least one gap in our current tooling and propose a logic-based solution to Engineering.
In your first 90 days…
- Provide direct guidance to customers on anomalies with peer support, moving toward full self-sufficiency.
- Contribute to the refinement of the LLM "instruction set" for cross-disciplinary diagnostics.
- Present a post-incident analysis correlating telemetry to a real-world root cause to the broader product team.
All of our interviews are held via Google Meet, and an active camera connection is required.
- Meeting with People Operations team member (30 minutes)
- Meeting with Hiring Manager (60 minutes)
- Technical Interview with Data Science team member (60 minutes)
- Meeting with Program Manager (60 minutes)
- Culture fit interview with Phaidra's co-founders (30 minutes)
US Residents:
- Tier 1 (Largest highest-cost metros): $119,200 - $163,900
- Tier 2 (Other major metros): $113,240 - $155,705
- Tier 3 (Mid-sized metro areas): $107,280 - $147,510
- Tier 4 (All other locations): $101,320 - $139,315
In addition to base salary, this position is eligible for equity. Final salary will be determined based on several factors, including a candidate's qualifications, skills, competencies, experience, expertise, education and location. In some cases, final compensation may fall outside the posted range. Salary ranges are regularly reviewed and may be adjusted in response to market trends.
Benefits & Perks
- Fast-paced, team-oriented environment where your work directly shapes the company's direction.
- We are a 100% remote company.
- Competitive compensation & meaningful equity.
- Outsized responsibilities & professional development.
- Training is foundational; functional, customer immersion, and development training.
- Medical, dental, and vision insurance (exact benefits vary by region).
- Unlimited paid time off, with a required minimum of 20 days per year.
- Paid parental leave (exact benefits vary by region).
- Flexible stipends to support your workspace, well-being, and continued professional development.
- Company MacBook.
Please note: Not all of Phaidra's benefits and perks listed above apply to temporary employees such as interns.
On being RemoteWe take a thoughtful and intentional approach to remote collaboration. Inspired by pioneers like GitLab, we embrace proven best practices to foster an exceptional remote work environment. Our culture is documentation-first, and we prioritize asynchronous communication to support focus and flexibility across time zones. While we value independence, we stay closely connected through tools like Slack and video conferencing. Weekly all-hands meetings help us align and build strong relationships, and we regularly host virtual team-building activities and social events to maintain a sense of camaraderie.
Equal Opportunity EmploymentPhaidra is an Equal Opportunity Employer; employment with Phaidra is governed on the basis of merit, competence, and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability, or any other legally protected status. We welcome diversity and strive to maintain an inclusive environment for all employees. If you need assistance with completing the application process, please contact us at View email address on ziprecruiter.com.
E-Verify NoticePhaidra participates in E-Verify, an employment authorization database provided through the U.S. Department of Homeland Security (DHS) and Social Security Administration (SSA). As required by law, we will provide the SSA and, if necessary, the DHS, with information from each new employee's Form I-9 to confirm work authorization for those residing in the United States.
Additional information about E-Verify can be found here.
#LI-Remote
To be considered for any position at Phaidra, you must submit an online application. This role will remain open until it is filled.
Phaidra only hires individuals who are legally authorized to work in the specified location(s) above. We do not provide employment sponsorship. Candidates requiring visa sponsorship, either now or in the future, are not eligible for hire.
Candidates who advance beyond the initial screening stage will be required to sign a Non-Disclosure Agreement (NDA) in order to continue through the interview process.
All employment offers are contingent upon successful completion of employment authorization verification and applicable background checks, in accordance with local laws and company policies.
WE DO NOT ACCEPT APPLICATIONS FROM RECRUITERS.
$138.99k - $195k
...performing, fast-moving team with ethics at the center of everything we do. Expectations are... ...for an exceptional Senior Network Reliability Engineer to help shape the future of our core... ...scalable and resilient cloud and data center networking solutions using automation...SuggestedWork at officeRemote workFlexible hoursShift work3 days per week$108k - $123.6k
...utility leaders are choosing Mainspring over traditional options like engines, turbines, and fuel cells to quickly and reliably deliver local power for EV charging, commercial facilities, data centers, and grid‑scale operations. The Mainspring Linear Generator is fuel...SuggestedLocal areaFlexible hours$148.2k - $300.96k
Site Reliability Engineer - Data Infrastructure Location: Seattle Employment Type: Regular Job Code: A32205 Responsibilities Incident response... ...for rapid problem detection and diagnosis. Data Center and AI Infrastructure: Support daily operations, construction...SuggestedTemporary workLocal area$120k - $150k
Fleet Data Centers designs, builds and operates mega-scale data center campuses. Fleet provides... ...positioned to bring in-house design, engineering and operational capabilities to... ...and Ethical Standards: Make safety- and reliability-focused decisions in all cooling design...SuggestedLocal area$139.5k - $258.1k
Seattle, Washington, United States Software and Services The Apple Service Engineering - Data Streaming SRE team is looking for Site Reliability Engineers with experience developing processes, tools, and automation for managing distributed systems in production environments...SuggestedRelocation$160k - $210k
...cutting-edge deep learning technology and data science to transform how brands connect... ...Role We are looking for a senior site reliability engineer to work on expanding our global... ...Hands on experience building out data-centers. ~ Willingness and interest to travel...Work at officeImmediate startRemote workWork from home- ...not limited to: Manage the overall program safety and reliability processes and deliveries in coordination with Blue Origin and... ...technical leadership of the highest caliber. Utilize good engineering and business judgement to ensure the most efficient and...Permanent employmentTemporary workLocal area
- ...Tract Capital is looking for a Senior Mechanical Design Engineer in Mercer Island, WA. The role involves the design and productization of mechanical infrastructure for data centers, utilizing 3D CAD tools. Candidates should have over 5 years of mechanical design experience...
$117.8k - $176.8k
...Your Opportunity The MCF (Mission Critical Facilities) Senior Project Engineer, Mechanical guides the technical design of our data center projects as subject matter expert, mentor, and resource to the technical team. Position delegates, leads, and directs work of Project...Full timeTemporary workPart timeCasual workLocal areaFlexible hours$85k - $95k
...Salas O’Brien in Seattle is seeking a Mechanical Engineer to join their fast-paced team. You will engineer and design projects primarily for Data Centers and sophisticated facilities, ensuring effective HVAC systems. This role requires a Bachelor's or Master's in Mechanical...$140.11k - $200.16k
...ourselves: What is our impact on the world? We believe building engineering is more than systems and structures, it's about powering... ...people work, connect, and thrive. From high-performance data centers driving the future of AI to dynamic commercial environments,...Full timeContract workTemporary workPart timeWork at office- ...ourselves: What is our impact on the world? We believe building engineering is more than systems and structures; it’s about powering... ...where people work, connect, and thrive. From high‑performance data centers driving the future of AI to dynamic commercial environments,...Full timeContract workWork at office
- ...designed to deliver symmetrical data speeds of up to 6 Tbps... ...space-grade requirements for reliability, thermal management, and EMI/... ...teamsCollaborate with cross-functional engineering teams to ensure electrical... ...for enterprise, data center, and government customers worldwide...Permanent employmentTemporary workLocal areaWorldwide
$120k - $150k
...Fleet Data Centers designs, builds and operates mega-scale data center campuses. Fleet provides its customers with flexibility and predictability... ...demand. Fleet is well positioned to bring in-house design, engineering and operational capabilities to collaborate with customers on...Local area- ...defend our way of life. From technicians and engineers to first responders and service members,... ...under 8 U.S.C. 1324b(a)(3). Role: Reliability Engineer Location: San Jose or Bellevue... ...from early design through production—using data, testing, and failure analysis to drive...Permanent employment
$120k - $150k
...horizontal powered land strategy focused on creating master planned data center campuses called Tract. The second is the mega-campus... ...called Fleet Data Centers. Position Senior Availability / Reliability Engineer leads availability modeling, reliability analysis, and...Work at officeFlexible hours- Rivet Industries, Inc. is seeking a Reliability Engineer in Bellevue, Washington, to ensure product performance under mission-critical conditions. The role involves defining reliability requirements, leading failure analysis, and collaborating across teams to enhance product...
$134.96k - $188.95k
...join their ground system team as a site reliability engineer. We are a passionate group developing... ...the Blue Ring (BNS) Mission Operations Center used to fly on-orbit vehicles. Join us... ...enhancements) during operator training; ground data system verification, validation, and...Permanent employmentFull timeTemporary workLocal areaImmediate start$120k - $150k
A leading investment firm is looking for a Senior Availability / Reliability Engineer in Seattle. The role involves leading availability modeling and reliability analysis for behind-the-meter power solutions. Applicants should have a Bachelor’s degree in Engineering and...$108k - $123.6k
Mainspring Energy, Inc. is hiring an experienced engineer to operate and troubleshoot Linear Generator units at our Menlo Park HQ. The candidate... ...engineering disciplines, manage testing priorities, and analyze data for product improvements. This role requires a Master’s degree...$102.1k - $153.2k
...the world’s leading integrated design practice. Our architects, engineers, interior designers, sustainability specialists, and... ...Engineer, Electrical guides the electrical technical design of data center projects as subject matter expert, mentor, and resource to the...Full timeTemporary workPart timeCasual workWork at officeLocal areaFlexible hours$120k - $150k
A digital infrastructure investment firm in Seattle seeks a Senior Availability/Reliability Engineer. The role involves leading reliability analysis for power solutions, defining availability targets, and managing operational risks across technologies like gas engines...- A leading engineering firm in the U.S. is seeking an Electrical BIM Designer responsible for managing Revit BIM models and designing 3... ...This position allows you to grow and innovate while contributing to impactful projects in the data center market. #J-18808-Ljbffr Jacobs
$151.2k - $204.6k
...the cloud running. We support all AWS data centers and all of the servers, storage, networking... ...of software, hardware, and network engineers, supply chain specialists, security experts... ...with different stakeholders (e.g. reliability, operations, design, compliance) • Work...Flexible hours$116.5k - $163k
...Traveltechessentialist is seeking a data scientist to lead the design and deployment of data science solutions focused on system resiliency... ...in machine learning, statistical analysis, and software engineering. The position offers a salary range from $116,500 to $163,000,...- ...Description VAST Data is looking for a Senior Systems Engineer to join our growing team! This is a great opportunity to be part of one of the fastest... ...companies in history, an organization that is in the center of the hurricane being created by the revolution in...Traineeship
$95k - $110k
...ensure the integrity of networks, data, systems, and processes.... ...We're looking for a Systems Engineer II interested in building and... ...A valid driver's license and reliable transportation are required.... ...empowers Security Operations Centers (SOCs) to detect, investigate...Work at officeRemote workFlexible hours$119k - $133k
...a stronger foundation for the future. Why we need a Service Reliability Engineer Ensure reliability and performance of a mission‑critical claims... ...performance tuning, troubleshooting, and working with data capture / batch processing patterns (e.g., CDC). Monitoring...Full timeSummer workWork at officeRemote workFlexible hoursNight shift$108.5k - $156.73k
...research for new materials and problem analysis. Familiarity with data systems to the extent of defining requirements for IT, and the... ...in their first year. On‑site amenities include a fitness center, a game room, on‑site physical therapist, and a subsidized café...Work at officeLocal areaRelocation packageMonday to Friday$112.2k - $162.15k
The Boeing Company is seeking a Mid-Level Service Reliability Developer in Seattle, WA, offering a chance to ensure the reliability and performance of mission-critical applications. You'll lead incident management and build automation for self-healing systems. With a focus...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Data Center Reliability Engineer. Be the first to apply!
- senior data center engineer Seattle, WA
- data engineer manager Seattle, WA
- data science developer Seattle, WA
- etl data engineer Seattle, WA
- entry level big data engineer Seattle, WA
- data engineer Seattle, WA
- big data cloud engineer Seattle, WA
- junior big data engineer Seattle, WA
- remote data engineer Seattle, WA
- senior data engineer Seattle, WA


