Site Reliability Engineer

FLUIX

FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure. We are based in Silicon Valley. We specialize in providing AI-driven solutions for data centers and power providers, leveraging cutting-edge Machine Learning (ML) and Artificial Intelligence (AI) technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid-based (Cloud & On-Prem) platform while supporting our AI/ML infrastructure. You will work closely with our engineering, AI, and operations teams to build and maintain robust systems that support our cutting-edge solutions. Your expertise in ML/AI and experience with data center sites will be crucial in driving the success of our platform. Who you’ll work closely with Founder & CEO Chase Overcash CTO What you’ll do Design, implement, and maintain scalable systems while optimizing performance, ensuring high availability and disaster recovery, and assisting with codebase refactoring for modular deployment. Develop and maintain automation tools to streamline operations, improve efficiency, and automate repetitive tasks to enhance system reliability. Collaborate with engineering and data science teams to integrate ML and AI models into production environments, while ensuring seamless integration and high performance of cutting-edge models within our technology stack. Identify areas for improvement and drive initiatives to enhance system reliability and performance, while staying updated on industry trends and advancements in SRE practices, ML, and AI technologies. Respond to and resolve incidents to minimize impact and ensure timely resolution, while conducting post-incident reviews and implementing improvements to prevent recurrence. Create and manage multiple cloud instances (dev, staging, test), optimize cloud infrastructure and data center operations, and ensure the security and compliance of both infrastructure and applications. Your background Bachelorʼs degree in Computer Science, Engineering, or a related field (or equivalent experience). Proven experience as a Site Reliability Engineer or similar role in a SaaS environment, with a strong background in managing and optimizing cloud infrastructure (AWS preferred, or GCP, Azure), experience with ML and AI technologies, and familiarity with data center operations integrations. Proficiency in programming and scripting languages (e.g., Python), experience with containerization and orchestration tools (Kubernetes), a strong understanding of networking, security, and performance optimization, and knowledge of CI/CD pipelines and DevOps practices. Excellent problem-solving skills with attention to detail, strong communication and collaboration abilities, and the capacity to thrive in a fast-paced, dynamic startup environment. Culture Fit We are looking for obsessed individuals who want to give it their all. We are not afraid to get our hands dirty with physical and software systems. We are eager to visit and work with clients and understand the importance and gravitas of their mission-critical work. We are eager to come into the office and on-site, as our work directly affects physical environments. Due to our mission-critical work, we understand and our eager to help our teammates and co-workers during holidays, weekends, and emergencies. We are cordial and over-communicate with teammates, co-workers, and management. Attractive compensation package, including equity options. Comprehensive health, dental, and vision insurance, along with other standard benefits. A dynamic and collaborative San Francisco Bay Area work environment. Opportunities for professional growth and development, with the chance to shape the future of technology in the industry. #J-18808-Ljbffr FLUIX

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Site Reliability Engineer in Palo Alto, CA vacancy

Senior Lead Site Reliability Engineer
...professionals for this role. JOB DESCRIPTION Elevate your engineering prowess to unprecedented levels by joining a team of... ...professionals and position yourself among the top echelon in site reliability. As a Senior Lead Site Reliability Engineer at JPMorgan Chase...
Suggested
J.P. Morgan
Palo Alto, CA
8 days ago
Senior Director of Site Reliability Engineering
...in transformative projects. Together, let's push boundaries and achieve unparalleled success. As a Senior Director of Site Reliability Engineering at JPMorgan Chase within the I nfrastructure Platforms and Foundational Services (IPFS) team , you are deemed as...
Suggested
J.P. Morgan
Palo Alto, CA
8 days ago
Senior Site Reliability Engineer
$140k - $220k
About the Job You’ll own reliability and operational excellence for Pylon’s production systems. This means designing and implementing... ...scale as we grow. You’ll build tooling that makes the entire engineering team more effective, establish on‑call rotations and runbooks...
Suggested
Pylon
Palo Alto, CA
3 days ago
Senior Site Reliability Engineer
$210k - $270k
...deeply thoughtful, driven, and collaborative teammates, read on. Your Impact on our Mission: Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production systems. You’ll be challenged with building frameworks...
Suggested
Flexible hours
Dormont Manufacturing Co
Palo Alto, CA
21 hours ago
Senior Site Reliability Engineer
The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational excellence of the production systems that power Nectar's platform. We run high-volume data ingestion pipelines and real-time AI agents on top of a fast-growing...
Suggested
Nectar
Palo Alto, CA
1 day ago
Site Reliability Engineer - Cybersecurity
$180k - $360k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform...
Temporary work
Relocation
Pantera Capital
Palo Alto, CA
4 days ago
Senior Site Reliability Engineer, Waymo Fleet
$213k - $263k
...driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states. Waymo's Software Reliability Engineers (SREs) are responsible for the stable operation of Waymo's fully autonomous systems and supporting infrastructure. As an SRE...
Full time
Remote work
Dormont Manufacturing Co
Mountain View, CA
21 hours ago
Staff Site Reliability Engineer
$150k - $180k
...financial, environmental, and innovation outcomes. Role Verrus is looking for candidates to serve as software-focused Senior Site Reliability Engineer at Verrus. This is a full‑time position based out of the Mountain View, CA office. Verrus takes a very technology‑forward...
Full time
Work at office
Local area
Flexible hours
Verrus, LLC
Mountain View, CA
21 hours ago
Senior Site Reliability Engineer / DevOps Engineer
...Infrastructure Footprint: Global production infrastructure across AWS, South America, and Europe Role Overview Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available global infrastructure supporting production systems...
Prophet Town
Mountain View, CA
3 days ago
Senior/Staff Site Reliability Engineer
$180k - $260k
...facilitating effortless integration into customers’ logistics operations. About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our growing fleet of autonomous vehicles. In this role, you will work...
Odd job
Work at office
Remote work
Booster
Mountain View, CA
1 day ago
Director of Site Reliability Engineering
...make a meaningful impact. Partner with an organization committed to defining the future of site reliability in the financial sector. As a Director of Site Reliability Engineering at JPMorgan Chase within the Infrastructure Platforms and Foundational Services (IPFS)...
JPMorgan Chase & Co.
Palo Alto, CA
2 days ago
Senior Site Reliability Engineer, Platform Infrastructure (Foundations)
...Andreessen Horowitz, NEA, and Addition with $250+ million raised to date. About the role Anyscale is looking for a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide the next generation of tools and infrastructure to make...
Cerebras
Palo Alto, CA
3 days ago
SaaS Platform Site Reliability Engineer
...cloud services. This position offers competitive compensation, comprehensive healthcare, and development resources. You will collaborate with various teams to tackle scalability and reliability issues while maintaining operational excellence. #J-18808-Ljbffr Menlo Ventures
Menlo Ventures
Palo Alto, CA
1 day ago
Senior Site Reliability Engineer - Cloud AI Infrastructure
Cerebras is looking for a Senior Site Reliability Engineer to join their Infrastructure team in Palo Alto, California. This role involves designing and optimizing infrastructure for distributed AI applications, contributing to the open-source Ray project, and ensuring high...
Cerebras
Palo Alto, CA
2 days ago
Senior Site Reliability Engineer | Uptime, Cloud & GenAI
Zocdoc, located in Silicon Valley, CA, is seeking a Senior Site Reliability Engineer to monitor and maintain cloud-based systems ensuring uptime for millions of patients. You'll work with cutting-edge technology in a diverse and collaborative environment. This role requires...
Dormont Manufacturing Co
Palo Alto, CA
21 hours ago
Senior Director, AI-Driven Site Reliability Engineering
JPMorgan Chase & Co. is seeking a Director of Site Reliability Engineering to partner with the Infrastructure Platforms and Foundational Services team in Palo Alto. This role involves guiding stakeholders through complex projects, leading the application of AI capabilities...
JPMorgan Chase & Co.
Palo Alto, CA
2 days ago
Staff Site Reliability Engineer- Developer Platform
...future that’s more connected, more intelligent, more sustainable for everyone. Role Summary We are seeking an experienced Site Reliability Engineer to help design, build, and operate the infrastructure that underpins the build pipelines that allow our companies to...
Full time
Contract work
Local area
Rivian and Volkswagen Group Technologies
Palo Alto, CA
1 day ago
Sr. Site Reliability Engineer
...that keep the world running. Location: 5 on-site days a week in Sunnyvale, CA Headquarters. Our Team's Vision: Our Engineering team is shaping the future of cybersecurity... ...are looking for an experienced Senior Site Reliability Engineer (SRE) with a strong background in...
Work experience placement
Illumio
Sunnyvale, CA
21 hours ago
Site Reliability Engineer
$170k - $200k
We are seeking a talented and motivated Site Reliability Engineer to join our engineering team. You will be responsible for building, maintaining, and troubleshooting cloud service/cluster, infrastructure, and monitoring systems to ensure high availability, performance,...
Full time
Zoomcar
Sunnyvale, CA
1 day ago
Site Reliability Engineer
$145k - $165k
...Your Ego : Selflessly collaborate towards our shared purpose. About the role Bolt Graphics is seeking a highly experienced Site Reliability Engineer (SRE) to design, build, and operate highly reliable developer and production systems. This role is mission-critical to...
Work at office
Bolt Graphics
Sunnyvale, CA
2 days ago
Site Reliability Engineer (Sunnyvale)
Education Requirements, Ideal Experience: Associate’s degree in Industrial Engineering or IT related field Minimum of 0-3 years’ relevant experience Knowledge of the application of tools/techniques Experience in one coding language (Preferred) Experience in Database (Preferred...
FII
Sunnyvale, CA
4 days ago
Senior Site Reliability Engineer, AIOPs
...building an AI Data Center AIOps platform that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for GPU fleets. Join our team of innovative engineers who are building this platform and operating it (not the compute cluster): uptime, performance...
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Site Reliability Engineer, ASE
$150.4k - $277.6k
..., Music, Books, Podcasts, and Fitness+ within Apple Services Engineering. These are revenue-critical, globally scaled services used by... ...Program Management, Security, and Infrastructure teams to embed reliability throughout the software development lifecycle Minimum...
Worldwide
Relocation
Apple Inc.
Cupertino, CA
21 hours ago
Senior Site Reliability Engineer - HPC
$152k - $241.5k
...infrastructure platforms for automated host lifecycle management, fleet reliability/auto‑healing, E2E observability or data‑driven operations (... ...languages such as Python, Go, Perl, or Ruby. Mentored other engineers and influenced technical direction through design reviews,...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Site Reliability Engineer
...Job Description Job Description Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes...
Amiri Recruiting
Mountain View, CA
21 days ago
Senior Site Reliability Engineer- Palo Alto, the US
...Job Description Job Description Senior Site Reliability Engineer (Payments Infrastructure) Kody is seeking a Senior Site Reliability Engineer to ensure the reliability, availability, scalability, and operational excellence of our global payment platform. You will...
Kody
Palo Alto, CA
16 days ago
Site Reliability Engineer - Supercomputing
$180k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...their teammates. About the Role We are seeking a talented Site Reliability Engineer (SRE) to join our SuperComputing team. In this role...
Temporary work
Relocation
xAI
Palo Alto, CA
more than 2 months ago
Senior Site Reliability Engineer
..., and the challenges of building in a high-growth startup, we’d love to talk. This is more than a job—it’s a journey. Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. The team owns...
Remote work
ASAPP
Mountain View, CA
26 days ago
Site Reliability Engineer
$180k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...About the Role We are seeking a highly skilled Senior Site Reliability Storage Engineer to join our mission-driven team, focusing on...
Temporary work
xAI
Palo Alto, CA
more than 2 months ago
Site Reliability Engineer
$120k - $165k
...looking for exceptional talent to join us on this extraordinary journey! Job Summary: Join the OS/Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you’ll own the...
Remote job
Full time
PsiQuantum
Palo Alto, CA
more than 2 months ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!