Senior Site Reliability Engineer
Hyperbolic Labs
Who We Are Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By aggregating computing resources across the globe, we offer an innovative GPU marketplace and AI inference service that promise affordability and accessibility for all. As pioneers at the intersection of AI and open-source technology, we believe in an open future where AI innovation is limited only by imagination, not by access to resources. We're looking for forward-thinking individuals who share our passion for making AI universally accessible, secure, and affordable. Join us in building a platform that empowers innovators everywhere to turn their visionary AI projects into reality. As we prepare for growth after our Series A, our team — led by co-founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and security. As an aggregator of compute resources from hundreds of global suppliers, our SLOs, trust, and economic efficiency are product-critical. You'll be responsible for defining and maintaining service level objectives for job success rates, building robust incident response systems, managing capacity across our distributed GPU network, and implementing secure rollout and rollback mechanisms that keep our platform running smoothly 24/7. In this role, you'll establish the reliability standards that define customer trust in our platform, design monitoring and alerting systems that provide deep visibility into our infrastructure, build automation for capacity management and resource allocation, lead incident response and post-mortem processes, and work closely with engineering teams to improve system resilience. You'll also focus on security and infrastructure hardening, ensuring strong isolation between tenants and suppliers, implementing key management systems, and building compliance frameworks. This is a high-impact position where your work directly influences our ability to deliver on our promise of affordable, accessible AI compute at scale. Who You Are Expert in site reliability engineering with proven experience defining, monitoring, and maintaining SLOs and SLAs for production systems Strong background in capacity planning and management, including forecasting, resource allocation, and cost optimization for distributed systems Experienced in incident response, on-call rotations, and post-mortem processes with a track record of reducing MTTR and improving system resilience Deep knowledge of deployment systems including progressive rollouts, canary deployments, feature flags, and automated rollback mechanisms Proficient in observability tools and practices including metrics, logging, tracing, and alerting systems (Prometheus, Grafana, ELK stack, or similar) Strong understanding of infrastructure security including tenant isolation, workload isolation, network segmentation, and security hardening Experience with secrets management, key management systems (KMS), certificate management, and secure credential rotation Knowledge of compliance frameworks and security best practices for cloud platforms (SOC 2, ISO 27001, or similar) Excellent problem-solving skills with ability to debug complex distributed systems issues under pressure Strong automation mindset with experience using infrastructure-as-code, configuration management, and CI/CD pipelines Preferred Qualifications Experience operating GPU infrastructure, AI/ML platforms, or compute marketplaces at scale Background in distributed systems, peer-to-peer networks, or decentralized infrastructure Knowledge of multi-tenancy security patterns, container security, and runtime security tools Experience with chaos engineering, fault injection, and resilience testing Familiarity with cost optimization strategies for cloud infrastructure and GPU resources Experience building and operating systems with demanding uptime requirements (99.9%+ SLAs) Background at companies like AWS, Google Cloud, Azure, or fast-growing infrastructure startups Contributions to open-source reliability, observability, or security tools Hyperbolic is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. #J-18808-Ljbffr Hyperbolic Labs
- ...acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from... ...redefining go-to-market with state-of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data...Senior
- ...CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We're constructing a pool of world-class Web site Reliability Engineers for present roles and for upcoming alternatives. You'll both be positioned...Senior
$127k - $249k
...The Team Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As...SeniorWork at officeLocal areaRemote workWorldwideFlexible hours- .... It's designed so Stellar's ecosystem can make a real-world, lasting impact. About the Role SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our engineering teams. You'll ensure the reliability and scalability...SeniorFull time
- US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...Senior
- Fieldguide is seeking a Senior Site Reliability Engineer to ensure the reliability and scalability of our production systems in San Francisco, CA. The role involves working closely with product teams to define reliability standards and build robust observability practices...SeniorRemote jobFlexible hours
- ...co‑founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...Senior
- ...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that... ...for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes...SeniorWork at officeWorldwide
- ...work from home day is currently Tuesday. Engineering at Lambda is responsible for building... ...observability adoptable and improve product reliability. Lead members of other engineering teams... ...in Go Have 5+ years of experience in Site Reliability Engineering practices Possess...SeniorWork at officeLocal areaWork from home
- What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you’ll lead on infrastructure design and...Senior
$160k - $250k
Responsibilities Automate manual operational processes Improve workflows of developer, data, and machine learning teams Manage secure integration and deployment tooling Create, maintain, monitor, and audit secure infrastructure Manage a diverse array of technology platforms...Senior- ...values and enthusiasm for building a great culture and product, you will find a home at Fieldguide. About the Role As a Senior Site Reliability Engineer (SRE) at Fieldguide, you will be responsible for ensuring the reliability, scalability, and observability of our...SeniorRemote workWork from homeFlexible hours
- We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll learn to deploy and maintain a fleet of RPC and validator nodes for multiple blockchain networks. You’ll also provide guidance...SeniorRemote job
- For more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations: US - San Francisco Bay Areatime type: Full timeposted on: Posted Yesterdayjob requisition id: R1478**There are NO limits to your career: come...SeniorImmediate startRemote workWorldwide
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,...SeniorFull timeWork at officeFlexible hours- # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we help companies earn and keep the trust of their... ...Job Summary:**Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be...SeniorWork at officeImmediate startWorldwideMonday to FridayFlexible hours
$166.9k - $225.9k
Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or...SeniorFlexible hours$165k - $225k
...it, and the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem. SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our engineering teams. You’ll ensure the reliability and scalability...SeniorTemporary workWork at officeLocal areaWorldwideFlexible hours$140k - $205k
Senior Technology Site Reliability Engineer page is loaded## Senior Technology Site Reliability Engineerlocations: San Francisco: New York: Santa Monica: Los Angeles: Palo Altotime type: Full timeposted on: Posted Yesterdayjob requisition id: Req 4348Senior Technology Site...SeniorFull timeTemporary workWork at officeFlexible hoursWeekend work$325k
Engineering at Ivo Engineers At Ivo Are Inventors. Ivo Was First-to-market With An AI agent that lives in MS Word and edits... ...expect us to hit our SLAs. We’re looking for an Senior or Staff Site level Reliability Engineer as part of the Infrastructure team to: Own uptime...SeniorContract work- Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded by Nat Friedman and Daniel Gross to give early‑stage startups access to the kind of scaled AI infrastructure once reserved...SeniorFull timeRemote work
- A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...Senior
$15 per hour
Summary The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to support and develop the platform serving the world’s favorite encyclopedia, Wikipedia, to millions of people around the globe. Wikimedia’s Site Reliability Engineering (SRE) team is...SeniorPermanent employmentFor contractorsRemote work- Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams, lead production readiness reviews, and build automation around monitoring and alerting. The ideal candidate has at least 6...Senior
$181k - $263k
...and supporting deployments of global products, and providing first line operational support. We are looking for a Senior Staff Site Reliability Engineer who will set the technical direction for reliability engineering across LiveRamp's global infrastructure. This is a...SeniorWork from homeFlexible hoursNight shift$127k - $249k
We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Staff SRE, you will be very hands‑on technically while also mentoring a small team of SREs. The InfraSec team collaborates...SeniorLocal areaRemote workFlexible hours- Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with product teams to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure projects, ensuring the reliability and performance of...Senior
- ...cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in... ...technical leadership role. You will own reliability for major platform domains, design scalable... ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a...Senior
$151.5k - $252.5k
A leading technology firm is seeking a Senior Site Reliability Engineer to join their Data Cloud engineering team in San Francisco. The role requires expertise in Azure infrastructure and SaaS applications, focusing on building reliable, scalable systems. The ideal candidate...Senior- ...David AI was founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we’ve brought on most FAANG... ...-edge generative audio models. About this role As a Senior Site Reliability Engineer at David AI, you will shape and build the foundation...SeniorFull timeWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!
- site reliability engineer San Francisco, CA
- site reliability engineer remote San Francisco, CA
- site reliability engineer sre San Francisco, CA
- senior development executive San Francisco, CA
- senior technical manager San Francisco, CA
- senior procurement specialist San Francisco, CA
- senior software development engineer in test San Francisco, CA
- senior manager data science San Francisco, CA
- senior platform engineer San Francisco, CA
- senior procurement San Francisco, CA



