Director of Site Reliability Engineering

$210k - $310k

Stellar

Interested in working on cutting-edge blockchain technology and creating equitable access to the global financial system? Since 2014, the mission-driven team at the Stellar Development Foundation (SDF) has helped fuel the tremendous growth of the Stellar blockchain network, an open-source platform that operates at high-scale today. Developers and companies around the world build on it, and the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem. SDF is looking for a Director of Site Reliability Engineering to lead a small, high-leverage SRE team and help shape how engineering teams own, operate, and improve production services. This is a senior engineering leadership role reporting to the CTO. You will set the vision, operating model, and culture for SRE while owning the core infrastructure services that help SDF engineering teams build, deploy, observe, and operate software with confidence. Engineering teams at SDF own the services they build. SRE provides the frameworks, standards, shared infrastructure, tooling, observability practices, and enablement model that make strong service ownership possible across engineering. You will be successful here if you bring strong technical judgment, pragmatic leadership, and the ability to influence through trust, clarity, and execution. SDF is a small, mission-driven foundation with a broad technical surface area, so this role requires leverage, ownership, and a bias toward solving the right problems over creating processes for its own sake. In this role, you will: Lead, coach, and develop a distributed SRE team, setting a clear vision, charter, operating model, priorities, and success measures. Define and roll out a Service Ownership & Maturity Framework across engineering, with expectations that vary appropriately by service criticality. Own and improve core engineering infrastructure services, including cloud foundations, Kubernetes and compute patterns, CI/CD, observability, secrets management, GitHub workflows, and infrastructure automation. Help engineering teams become stronger owners and operators of their services through better standards, dashboards, runbooks, alerting, escalation paths, operational readiness, and deployment practices. Make reliability, operational maturity, infrastructure health, and developer productivity more measurable through trusted metrics and practical operational intelligence. Improve deployment automation, resilience, self-healing patterns, disaster recovery readiness, and service reliability based on actual impact and risk. Mature incident response, escalation, postmortems, and on-call health across a geographically distributed team. Build paved paths and self-service infrastructure that reduce toil, lower cognitive load, and help engineering teams move faster while strengthening ownership and reliability. Partner closely with Security, Compliance, Legal, Finance, Procurement, and Corporate IT where infrastructure, access management, cloud operations, vendor review, or controls intersect with engineering. Pragmatically evaluate AI-assisted and agentic workflows where they can improve infrastructure operations, service ownership, developer workflows, or toil reduction. You have: 10+ years of experience in SRE, infrastructure engineering, platform engineering, cloud infrastructure, production operations, or closely related engineering roles. 5+ years of experience leading, managing, or formally developing infrastructure, SRE, platform, or reliability engineers. Strong experience defining team charters, operating models, roadmaps, success measures, and engineering practices for infrastructure or reliability teams. Deep technical judgment across cloud infrastructure, production operations, distributed systems, reliability tradeoffs, automation, and operational risk. 3+ years of experience with modern cloud infrastructure in AWS, GCP, or similar environments. 3+ years of experience with Kubernetes, container orchestration, infrastructure-as-code, declarative systems, CI/CD, and deployment safety. Strong experience with observability, monitoring, alerting, logging, dashboards, SLOs/SLIs, incident response, postmortems, and on-call practices. Experience helping product or application engineering teams improve service ownership, operational readiness, and production accountability. A pragmatic approach to tooling: you understand when to build, buy, adapt, simplify, or retire systems based on the actual engineering problem. The ability to operate effectively in a small or mid-sized engineering organization where influence comes from credibility, judgment, and outcomes rather than bureaucracy. Clear executive communication skills and the ability to partner directly with a CTO and senior engineering leaders. Bonus Points if: Experience leading SRE, infrastructure, or platform work in a lean, high-agency organization. Experience supporting globally distributed teams or 24/7 operational coverage. Experience improving developer productivity through paved paths, self-service infrastructure, automation, and reduced toil. Experience with infrastructure security fundamentals, secrets management, access controls, cloud security practices, or compliance-related infrastructure controls. Experience in financial services, regulated environments, blockchain, crypto, Web3, or other high-reliability technical ecosystems. Experience evaluating vendors and infrastructure platforms with skepticism, technical rigor, and cost discipline. Practical experience applying AI-assisted or agentic workflows to infrastructure, reliability, operations, observability, or developer productivity. We offer competitive pay with a base salary range for this position of $210,000 - $310,000 depending on job-related knowledge, skills, experience, and location. In addition, we offer lumen-denominated grants along with the following perks and benefits: USA Benefits/Perks: Competitive health, dental & vision coverage with most plans covered at 100% for the employee + any dependents Flexible time off + 15 company holidays including a company-wide holiday break Generous paid parental leave for all parents, plus paid pregnancy disability leave for birthing parents Gym reimbursement ($80 per month) Life & ADD (up to $50K) Short & Long term disability 401K with 4% match Health & Dependent Care FSA Accounts Commuter benefits with $250/month employer contribution Health Savings Account (HSA) with monthly employer contribution Family building benefits through Kindbody Wellbeing benefits (One Medical, Rightway, Headspace) L&D budget of $1,500/year Daily lunch and snacks in office Company retreats #LI-Hybrid About Stellar Stellar is more than a blockchain. Powered by a decentralized, fast, scalable, and uniquely sustainable network made for financial products and services and a thriving and passionate ecosystem that includes a non-profit organization driven by a mission, Stellar is paving the path to unlock the world’s economic potential through blockchain technology. Built with speed and low costs in mind, the Stellar network provides builders and financial institutions worldwide a platform to issue assets, and to send and convert currencies in real time creating real world utility. Founded in 2014, the Stellar Development Foundation (SDF) supports the continued development and growth of the Stellar network and also serves the ecosystem of NGOs, corporations, universities, small businesses, governments, and solo entrepreneurs building on the Stellar network through tooling, funding and strategic collaborations. Together, Stellar is where blockchain meets the real world. About the Stellar Development Foundation The Stellar Development Foundation (SDF) is a non-profit organization focused on working with and supporting change-makers to create equitable access to the global financial system through blockchain technology. SDF provides grants, investments, funding, and other awards to builders and organizations. SDF also develops resources and tooling on the Stellar network to help unlock real world utility. As a nonprofit foundation, SDF puts the health of the Stellar network and the Stellar ecosystem and its mission above all else. We look forward to hearing from you! Privacy Policy By submitting your application, you are agreeing to our use and processing of your data in accordance with our Privacy Policy. SDF is committed to diversity in its workforce and is proud to be an equal opportunity employer. SDF does not make hiring or employment decisions on the basis of race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other basis protected by applicable local, state or federal law. #J-18808-Ljbffr Stellar

Apply

Vacancy posted 16 hours ago

Similar jobs that could be interesting for youBased on the Director of Site Reliability Engineering in San Francisco, CA vacancy

Director of SRE: Lead Reliability & Platform Engineering
Stellar is seeking a Director of Site Reliability Engineering to lead a distributed SRE team and shape service operations. This role is crucial for improving the reliability and operational maturity of services within the Stellar ecosystem. The ideal candidate will have...
Suggested
Stellar
San Francisco, CA
16 hours ago
Site Reliability Engineer Scale & Resilience for AI Ops
...A high-growth AI startup in San Francisco is seeking a Site Reliability Engineer to lead the scaling of operational resilience. In this role, you will own system stability and debugging workflows while tackling complex failures and enhancing proactive operations. Ideal...
Suggested
Happy Robot
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$210k - $240k
...Join to apply for the Senior Site Reliability Engineer role at Alembic Technologies This range is provided by Alembic Technologies. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $210,000....
Suggested
Full time
Alembic Technologies
San Francisco, CA
2 days ago
Site Reliability Engineer
...millions of daily users while enabling our engineering teams to ship fast. You'll own the... ...building automation and tooling that improves reliability and partnering with engineering to... ...services What you'll bring ~5+ years in Site Reliability Engineering, DevOps, or...
Suggested
Work at office
Work from home
Gamma
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
$175k - $250k
...Job Title: Senior Cloud Infrastructure Engineer Location: San Francisco, CA. Remote unavailable. Modality: On-Site only. Must live within commuting distance... ...while ensuring scalability, performance, and reliability across environments. What You’ll Do Design...
Suggested
Full time
Remote work
Relocation
Relocation package
The Recruiting Guy
San Francisco, CA
2 days ago
CloudDevs: Senior Site Reliability Engineer (SRE)
...CloudDevs works with fast-moving, venture-backed startups across the US. We’re building a pool of world-class Site Reliability Engineers for current roles and for upcoming opportunities. You will either be placed directly into one of our partner startups or added to our...
Local area
Breakout Tools
San Francisco, CA
2 days ago
Senior Site Reliability Engineer
...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like... ...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and...
Unify
San Francisco, CA
2 days ago
Site Reliability Engineer - Inference
$56.25 - $137 per hour
...Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai Get AI-powered advice on this job and more exclusive...
Full time
Summer work
Internship
H1b
Shift work
jobright.com
San Francisco, CA
2 days ago
Director of Site Reliability Engineering
$210k - $310k
...Director of Site Reliability Engineering Interested in working on cutting-edge blockchain technology and creating equitable access to the global financial system? Since 2014, the mission-driven team at the Stellar Development Foundation (SDF) has helped fuel the tremendous...
Temporary work
Work at office
Local area
Remote work
Worldwide
Flexible hours
Crypto Pro Network
San Francisco, CA
2 days ago
Senior Manager, Site Reliability Engineering
$227.2k - $324.5k
...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems....
Full time
Contract work
Temporary work
Local area
Flexible hours
Tubi
San Francisco, CA
16 hours ago
Infrastructure Site Reliability Engineer (Local only)
$80 per hour
...Infrastructure Site Reliability Engineer (Local only) Direct message the job poster from Maxonic Inc. Job Description: Job Title: Infrastructure Site Reliability Engineer Job Type: Contract (4+ months) with strong possibility to convert to fulltime Job...
Full time
Contract work
For contractors
Local area
2 days per week
Maxonic
San Francisco, CA
2 days ago
Site Reliability Engineer
...Prometheus, performance profiling As the SRE, you'll own the reliability and performance of the LiteLLM proxy in production. Our users... ...critical projects including: Fixing OOM issues — e.g. Prisma Query Engine unable to recover from OOMKill in K8s deployments, unbounded...
Litellm
San Francisco, CA
16 hours ago
Senior Site Reliability Engineer
...and enthusiasm for building a great culture and product, you will find a home at Fieldguide. About the Role As a Senior Site Reliability Engineer (SRE) at Fieldguide, you will be responsible for ensuring the reliability, scalability, and observability of our production...
Remote work
Work from home
Flexible hours
Fieldguide
San Francisco, CA
4 days ago
Sr. Site Reliability Engineer
$163k - $203k
...will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our...
Work experience placement
Work at office
Local area
Remote work
Flexible hours
2 days per week
Prosper
San Francisco, CA
2 days ago
Site Reliability Engineer
The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure platform. You’ll be building and operating the core systems that power agentic AI at scale. Your mission: keep our ultra-...
Blaxel
San Francisco, CA
2 days ago
Site Reliability Engineer
$125k - $165k
Position: Site Reliability Engineer Location: San Francisco, CA Job Id: 434 # of Openings: 1 TELCOR Inc, a leading innovator in laboratory software, is looking for a Site Reliability Engineer to join our TELCOR AI Systems team! Do you have strong experience in cloud...
Temporary work
Work at office
Visa sponsorship
Work visa
Relocation package
Flexible hours
TELCOR
San Francisco, CA
4 days ago
Senior Site Reliability Engineer (Upmarket)
...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that... ...for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes...
Work at office
Worldwide
Heidi Health Ltd
San Francisco, CA
1 day ago
Site Reliability Engineer
...and experiment constantly as we find the right paths in an AI-native landscape. The Role: You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack product team running over 3 million sync jobs a week powering thousands of data...
Local area
Airbyte
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
...Responsibilities Lead and onboard services and teams to the reliability tenets. Establish and maintain Service Level Objectives (... ...Science or equivalent. 6+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale. History of...
OutSystems, Inc.
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...
Axiom Pursuits
San Francisco, CA
1 day ago
Senior Site Reliability Engineer, Storage
$50 per hour
...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor's Degree in Computer Science or related field, or 8+ years relevant work...
Temporary work
Work experience placement
Epoch Biodesign
San Francisco, CA
16 hours ago
Remote Senior Site Reliability Engineer — Observability & Resilience
Fieldguide is seeking a Senior Site Reliability Engineer to ensure the reliability and scalability of our production systems in San Francisco, CA. The role involves working closely with product teams to define reliability standards and build robust observability practices...
Remote job
Flexible hours
Fieldguide
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
For more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations: US - San Francisco Bay Areatime type: Full timeposted on: Posted Yesterdayjob requisition id: R1478**There are NO limits to your career: come...
Immediate start
Remote work
Worldwide
OutSystems Inc.
San Francisco, CA
1 day ago
Remote Site Reliability Engineer: Scale & Observability
$175k - $250k
I did my part and supported the Regular Toilet is seeking a Site Reliability Engineer to enhance the reliability and performance of our systems at WorkOS. As a key member of the SRE team, you will handle critical responsibilities like improving incident responses and collaborating...
Remote job
Flexible hours
I did my part and supported the Regular Toilet
San Francisco, CA
1 day ago
Senior Site Reliability Engineer, Spend
What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you’ll lead on infrastructure design and...
Airwallex-
San Francisco, CA
16 hours ago
Senior Site Reliability Engineer
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,...
Full time
Work at office
Flexible hours
Bonfirevc
San Francisco, CA
1 day ago
Site Reliability Engineer
$175k - $250k
...fully distributed across North American time zones and supports a fast‑growing customer base of SaaS companies. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team ensures the WorkOS platform remains fast, reliable, and resilient at...
Remote work
I did my part and supported the Regular Toilet
San Francisco, CA
1 day ago
Senior Site Reliability Engineer
# Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we help companies earn and keep the trust of... ...**Job Summary:**Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part...
Work at office
Immediate start
Worldwide
Monday to Friday
Flexible hours
Careers at Drata
San Francisco, CA
2 days ago
Site Reliability Engineer
...manifesto. About the Role We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we... ...This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and...
Worldwide
Shift work
Happyrobot Inc.
San Francisco, CA
1 day ago
Hyperbolic Labs - Senior Site Reliability Engineer
...co‑founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...
deCircle
San Francisco, CA
16 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Director of Site Reliability Engineering. Be the first to apply!