Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Database Reliability Engineer

$200k - $250k

scribehow.com

About the role We're hiring a Staff Database Reliability Engineer to own the strategy, architecture, and operational excellence of our data infrastructure. This is an expert-level IC role with deep influence on engineering direction, partnering closely with platform, backend, and DevOps engineers. Why this role matters You will own the data tier end-to-end. Design schemas and access patterns that scale, tune Aurora for latency and throughput, and set the standards for how engineers interact with our databases. When a migration script seizes up mid-deploy and writes start queueing behind an ACCESS EXCLUSIVE lock, your runbooks and automation resolve the incident quickly. Make the Django ORM a strength, not a liability: Review migrations for safety at scale — locks, backfills, concurrent index builds, NOT VALID constraints Catch N+1 patterns and missing select_related/prefetch_related in review Establish conventions for QuerySet usage and physical schema design (indexes, constraints, partitioning) Scale review through automation, not heroics — author AGENTS.md files and DNA scaffolding that encode our conventions, configure AI review bots (Claude Code, Cursor, etc.) to flag risky migrations and ORM anti-patterns, and iterate on those configs as new failure modes emerge Lead major infrastructure initiatives: Capacity planning as traffic and engineering throughput grow Zero-downtime schema migrations and cutovers Multi-AZ resilience within a single region — Aurora writer/reader placement, failover behavior and RTO/RPO, ElastiCache and OpenSearch AZ topology, RabbitMQ survivability across AZs Backups, PITR, failover testing, retention Own the CDC pipeline (Aurora → DMS → S3 Parquet → Snowflake): DMS task design and tuning, replication slot hygiene on the Postgres side Schema evolution as Django migrations roll through — so a column rename doesn't silently break the warehouse at 6 AM Parquet layout and partitioning, reliability of the Snowflake handoff Automated checks that flag migrations likely to break downstream consumers Drive observability across three complementary tools: pganalyze — query‑level performance, index advisor, schema insights - the go‑to for 'why is this ORM query slow' CloudWatch — infrastructure metrics and alarms for Aurora, OpenSearch, ElastiCache, SQS, DMS Honeycomb — high‑cardinality tracing that ties slow DB calls back to users, flags, deploys, and flows Shape how the three fit together, including Django‑side instrumentation and trace attributes on ORM queries Build tooling and guardrails: Migration review automation and CI checks for risky patterns Slow query pipelines fed from pganalyze Self‑service dashboards so teams understand their own query footprint > Support and evolve the rest of the stack: OpenSearch — index design, sharding, mapping changes, reindexing strategy, Django‑side indexing pipelines Redis — caching patterns, eviction, sizing, Django cache framework, Celery/RQ usage, avoiding hot keys and thundering herds SQS + RabbitMQ — queue design, DLQs, visibility timeouts, exchange/queue topology, AZ mirroring, consumer backpressure, Celery behavior under load What makes you a great fit Core expertise: Deep PostgreSQL — EXPLAIN (ANALYZE, BUFFERS), MVCC, bloat, lock contention, vacuum/autovacuum. Aurora Serverless V2 / Limitless experience strongly preferred (storage model, reader/writer split, ACU scaling) Strong ORM fluency (Django, SQLAlchemy, ActiveRecord, or similar) — predict the SQL a query will generate, spot N+1 problems on sight and how to control eager loading (joins vs. batched IN queries), column projection, aggregations, and subqueries Single-region multi‑AZ design — practical understanding of what it does and doesn't protect against Data movement and observability: Production CDC experience, ideally AWS DMS — comfortable with logical replication, slot hygiene, schema evolution, and Parquet‑based data lakes feeding Snowflake (or BigQuery/Redshift) Hands‑on with pganalyze (or Datadog DBM / Performance Insights / pg_stat_statements pipelines), CloudWatch (custom metrics, composite alarms, log insights), and Honeycomb (or another high‑cardinality tracing tool) — comfortable with OpenTelemetry and opinionated about what makes a trace useful AI‑assisted workflow: Real experience making AI coding and review tools useful for a team — writing AGENTS.md files, configuring review agents, versioning and iterating on prompts and configs The rest of the stack: OpenSearch at scale — sizing, sharding, JVM tuning, rolling upgrades, snapshots Production Redis — persistence tradeoffs, cluster mode, hot keys, thundering herds At least one production message broker (SQS, RabbitMQ, Kafka) — delivery semantics, idempotency, failure modes Engineering and leadership: Strong automation and IaC background — real code (Python, Go, or similar) and Terraform Track record leading cross‑team initiatives, writing design docs that hold up, influencing without authority Comfortable in a high‑growth environment where the right answer for 50 engineers isn't the right answer for 100 Pragmatic outlook during incidents — focused on preventing the next one Full‑Time US Employee Benefits Include Some of the nicest and smartest teammates you’ll ever work with Competitive salaries Comprehensive healthcare benefits Exciting and motivating equity Flexible PTO 401k Parental Leave Commuter Benefits (SF office employees) WFH Stipend Compensation $200k-$250k base + equity We consider several factors when determining compensation, including location, experience, and other job‑related factors. At Scribe, we celebrate our differences and are committed to creating a workplace where all employees feel supported and empowered to do their best work. We believe this benefits not only our employees but our product, customers, and community as well. Scribe is proud to be an Equal Opportunity Employer. #J-18808-Ljbffr scribehow.com

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Database Reliability Engineer in San Francisco, CA vacancy
  •  ...who hold a high bar, move fast, and care deeply about each other and our customers. About the Role We’re hiring a Senior Database Reliability Engineer to own the reliability, performance, and scalability of Scribe’s data tier. Our engineering org is doubling — which... 
    Suggested
    Full time
    Work at office
    Remote work
    Home office
    Flexible hours
    3 days per week

    scribehow.com

    San Francisco, CA
    5 days ago
  • $160k - $220k

     ...This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Senior Database Reliability Engineer (DBRE)  Experience Level: Mid–Senior (4+ years PostgreSQL experience) About the Role We are looking for a highly... 
    Suggested
    Permanent employment
    Work at office
    Local area
    Worldwide
    Flexible hours

    Okta

    San Francisco, CA
    a month ago
  • $225k - $290k

    P2P is hiring a senior leader to define and implement data reliability and quality standards. The role involves cross-team initiatives to enhance data observability and governance across the organization. The ideal candidate has extensive experience in scalable data platforms... 
    Suggested
    Remote job

    P2P

    San Francisco, CA
    3 days ago
  • Hudson Manpower is seeking a Mechanical Engineer - Offshore Reliability for a role involving the improvement of offshore mechanical equipment reliability and performance. This position requires a Bachelor's Degree in Mechanical Engineering and a minimum of 12 years of experience... 
    Suggested

    Hudson Manpower

    San Francisco, CA
    2 days ago
  • $150k - $180k

     ...’t it. The Role As we continue to develop and deploy cutting-edge autonomous technologies, we are seeking a Senior Reliability Engineer (REL) to lead efforts in ensuring the long-term performance, durability, and robustness of critical hardware systems. This role... 
    Suggested
    Full time
    Immediate start
    Worldwide
    Flexible hours
    Night shift

    Eight Sleep

    San Francisco, CA
    19 days ago
  • $160k - $190k

    Southern Recruiting Solutions, Inc. seeks a Sr. Reliability Engineer based in San Francisco, California. This role requires a Bachelor's in Mechanical Engineering and over 8 years of experience in a chemical plant or refinery. The successful candidate will conduct root... 

    Southern Recruiting Solutions, Inc.

    San Francisco, CA
    2 days ago
  • Responsibilities The Sr. Reliability Engineer will conduct root cause failure analysis (RCFA) to identify equipment breakdown causes and develop solutions to prevent recurrence. Perform reliability-centered maintenance (RCM) studies to identify critical equipment and... 
    Relocation

    Southern Recruiting Solutions, Inc.

    San Francisco, CA
    2 days ago
  • A leading AI research company in San Francisco is seeking a Software Engineer to enhance infrastructure supporting cutting-edge AI systems. The role involves designing reliable systems and optimizing performance for millions of users. Ideal candidates possess experience... 

    OpenAI

    San Francisco, CA
    3 days ago
  • A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations... 

    OpenAI

    San Francisco, CA
    2 days ago
  • $150k

    A technology company in San Francisco seeks a Research Engineer to develop their reliability platform for LLM applications. The role focuses on optimization and testing methodologies while emphasizing hands-on implementation and collaboration with clients. Ideal candidates... 

    Enboarder

    San Francisco, CA
    1 day ago
  • $180k - $230k

     ...Job Description Job Description Job Title: Staff Reliability Engineer Location: Burlingame, CA Department: ESS Engineering Reports To: Staff Reliability Engineer Position Type: Full-time About Peak Energy Peak Energy is the first American... 
    Full time
    Immediate start
    Flexible hours

    Peak Energy

    San Francisco, CA
    6 days ago
  • scribehow.com is seeking a Senior Database Reliability Engineer based in San Francisco (hybrid model). You will own the reliability, performance, and scalability of our data tier and work with a growing engineering team. Your expertise will ensure smooth operations across... 
    Remote job

    scribehow.com

    San Francisco, CA
    5 days ago
  • A leading AI research organization in San Francisco is seeking a cross-stack engineer to ensure reliability in next-generation AI systems. This hands-on position requires extensive experience in reliability modeling and DFX architecture to enhance the durability and performance... 

    OpenAI

    San Francisco, CA
    1 day ago
  • We’re looking for a Systems Reliability Engineer to own the reliability of our system across cloud, edge, and real-world environments . Our platform runs across distributed infrastructure—connecting cloud services, on-site compute, and live video/data pipelines inside... 
    Permanent employment

    Claryo

    San Francisco, CA
    5 days ago
  • $175k - $300k

    Fluidstack, located in San Francisco, is seeking a Production Engineer to ensure the health of their compute fleet. You will build metrics pipelines and automate repair workflows, defining what production-ready hardware means. The ideal candidate has strong hardware intuition... 

    Fluidstack

    San Francisco, CA
    5 days ago
  • $293k - $385k

     ...About the Team The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and operating critical on prem and hybrid environments that power internal services and critical R&D environments. This is a new, bootstrap... 
    Work at office

    OpenAI

    San Francisco, CA
    3 days ago
  • $150k - $250k

     ...As our Founding Security Reliability Engineer at Charta Health, you'll pioneer the application of Site Reliability Engineering principles to ensure the unwavering security, resilience, and operational excellence of our cutting-edge generative AI platform. This is... 

    Charta Health

    San Francisco, CA
    2 days ago
  •  ...enable hardware optimized specifically for AI. About the Role We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. This hands-on individual contributor will sit within our hardware team and work closely with... 

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...Staff Data Architect At Komodo Health, our mission is to reduce...  ...keeping these platforms reliable, scalable, and secure, the team...  ...partner closely with analytics engineers, data scientists,...  ...accomplished… Reduced database spend by at least 15% through... 

    Komodo Health

    San Francisco, CA
    3 days ago
  • $190k - $250k

     ...to manage heart disease. As a Staff Data Architect, you will lead...  ...for our transactional databases, analytical Data Lake, and the...  ...partner deeply with Software Engineering, IT, Research, and Systems Engineering...  ...workloads remain fast, reliable, and secure. Lead Enterprise... 
    Work experience placement
    Local area
    Worldwide
    Relocation

    HeartFlow

    San Francisco, CA
    4 days ago
  •  ...Postgres Database Internals Engineer ParadeDB is a transactional alternative to Elasticsearch built on Postgres. We build state-of-the-art full-text search and columnar analytics as a Postgres extension. Companies like Modern Treasury, BILT Rewards, Alibaba Cloud, many... 
    Full time
    Work at office

    ParadeDB

    San Francisco, CA
    3 days ago
  •  ...operation, and maintenance of enterprise database environments across Oracle and SQL...  ...optimization strategies to improve efficiency and reliability. - Manage backup and recovery...  ...leadership and oversight to junior database engineers, ensuring adherence to standards, best... 
    Minimum wage
    Contract work
    Temporary work
    Work experience placement
    Remote work

    MAXIMUS

    San Francisco, CA
    5 days ago
  •  ...planning, building, deployment, and maintenance of enterprise database environments across Oracle and SQL platforms. - Perform database...  ...database-related issues. - Assist senior database engineers with performance tuning, optimization, and ongoing improvement... 
    Minimum wage
    Contract work
    Temporary work
    Work experience placement
    Remote work

    MAXIMUS

    San Francisco, CA
    5 days ago
  • $150k - $250k

    Madrona Venture Labs is seeking a Founding Security Reliability Engineer in San Francisco to design and maintain secure infrastructure for generative AI healthcare solutions. This pivotal role focuses on applying SRE principles to bolster security within a regulated environment... 

    Madrona Venture Labs

    San Francisco, CA
    3 days ago
  •  ...for talent across our geographies. Responsibilities Define reliability vision and roadmap, build and mentor a top-tier team, and embed...  ...equivalent industry experience in electronics or reliability engineering. 10+ years of experience in reliability engineering for... 
    Worldwide

    Reliabilityweb.com

    San Francisco, CA
    3 days ago
  • A leading data and AI company in San Francisco seeks a Senior Software Engineer for Database Engine Internals. You will design and implement advanced query systems that outperform current technologies. Candidates should have experience in query optimization and a strong... 

    Databricks Inc.

    San Francisco, CA
    1 day ago
  • $200k - $250k

    Scribehow.com is looking for a Staff Database Reliability Engineer to take charge of their data infrastructure strategy and architecture. In this role, you will design scalable access patterns, drive observability across tools like pganalyze and Honeycomb, and lead infrastructure... 
    Flexible hours

    scribehow.com

    San Francisco, CA
    3 days ago
  • $350k

    Menlo Ventures is seeking a Research Engineer to enhance the reliability and infrastructure of AI systems focused on professional workflows. The ideal candidate will have substantial Python coding experience and a strong background in operating machine learning systems... 
    Work at office

    Menlo Ventures

    San Francisco, CA
    2 days ago
  • $68.64k - $93.06k

    Staff Research Associate III (Data Scientist) Job Category : Staff Research Assoc Requisition Number : STAFF001507 Apply now Posted...  ...and Processing Collaborate with biomechanics researchers and engineers to collect and integrate large-scale biomechanical datasets.... 
    Full time
    Work at office

    NCIRE

    San Francisco, CA
    3 days ago
  • $233.5k - $350.5k

     ...GoFundMe team is searching for our next Senior Staff Data Platform Architect to help design and...  ...experiences. You’ll partner closely with engineering, product, analytics, and data science to build systems that are reliable, efficient, and built for long‑term growth.... 
    Full time
    Work at office
    Flexible hours

    GoFundMe

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Database Reliability Engineer. Be the first to apply!