Staff Database Reliability Engineer

$200k - $250k

scribehow.com

About the role We're hiring a Staff Database Reliability Engineer to own the strategy, architecture, and operational excellence of our data infrastructure. This is an expert-level IC role with deep influence on engineering direction, partnering closely with platform, backend, and DevOps engineers. Why this role matters You will own the data tier end-to-end. Design schemas and access patterns that scale, tune Aurora for latency and throughput, and set the standards for how engineers interact with our databases. When a migration script seizes up mid-deploy and writes start queueing behind an ACCESS EXCLUSIVE lock, your runbooks and automation resolve the incident quickly. Make the Django ORM a strength, not a liability: Review migrations for safety at scale — locks, backfills, concurrent index builds, NOT VALID constraints Catch N+1 patterns and missing select_related/prefetch_related in review Establish conventions for QuerySet usage and physical schema design (indexes, constraints, partitioning) Scale review through automation, not heroics — author AGENTS.md files and DNA scaffolding that encode our conventions, configure AI review bots (Claude Code, Cursor, etc.) to flag risky migrations and ORM anti-patterns, and iterate on those configs as new failure modes emerge Lead major infrastructure initiatives: Capacity planning as traffic and engineering throughput grow Zero-downtime schema migrations and cutovers Multi-AZ resilience within a single region — Aurora writer/reader placement, failover behavior and RTO/RPO, ElastiCache and OpenSearch AZ topology, RabbitMQ survivability across AZs Backups, PITR, failover testing, retention Own the CDC pipeline (Aurora → DMS → S3 Parquet → Snowflake): DMS task design and tuning, replication slot hygiene on the Postgres side Schema evolution as Django migrations roll through — so a column rename doesn't silently break the warehouse at 6 AM Parquet layout and partitioning, reliability of the Snowflake handoff Automated checks that flag migrations likely to break downstream consumers Drive observability across three complementary tools: pganalyze — query‑level performance, index advisor, schema insights - the go‑to for 'why is this ORM query slow' CloudWatch — infrastructure metrics and alarms for Aurora, OpenSearch, ElastiCache, SQS, DMS Honeycomb — high‑cardinality tracing that ties slow DB calls back to users, flags, deploys, and flows Shape how the three fit together, including Django‑side instrumentation and trace attributes on ORM queries Build tooling and guardrails: Migration review automation and CI checks for risky patterns Slow query pipelines fed from pganalyze Self‑service dashboards so teams understand their own query footprint > Support and evolve the rest of the stack: OpenSearch — index design, sharding, mapping changes, reindexing strategy, Django‑side indexing pipelines Redis — caching patterns, eviction, sizing, Django cache framework, Celery/RQ usage, avoiding hot keys and thundering herds SQS + RabbitMQ — queue design, DLQs, visibility timeouts, exchange/queue topology, AZ mirroring, consumer backpressure, Celery behavior under load What makes you a great fit Core expertise: Deep PostgreSQL — EXPLAIN (ANALYZE, BUFFERS), MVCC, bloat, lock contention, vacuum/autovacuum. Aurora Serverless V2 / Limitless experience strongly preferred (storage model, reader/writer split, ACU scaling) Strong ORM fluency (Django, SQLAlchemy, ActiveRecord, or similar) — predict the SQL a query will generate, spot N+1 problems on sight and how to control eager loading (joins vs. batched IN queries), column projection, aggregations, and subqueries Single-region multi‑AZ design — practical understanding of what it does and doesn't protect against Data movement and observability: Production CDC experience, ideally AWS DMS — comfortable with logical replication, slot hygiene, schema evolution, and Parquet‑based data lakes feeding Snowflake (or BigQuery/Redshift) Hands‑on with pganalyze (or Datadog DBM / Performance Insights / pg_stat_statements pipelines), CloudWatch (custom metrics, composite alarms, log insights), and Honeycomb (or another high‑cardinality tracing tool) — comfortable with OpenTelemetry and opinionated about what makes a trace useful AI‑assisted workflow: Real experience making AI coding and review tools useful for a team — writing AGENTS.md files, configuring review agents, versioning and iterating on prompts and configs The rest of the stack: OpenSearch at scale — sizing, sharding, JVM tuning, rolling upgrades, snapshots Production Redis — persistence tradeoffs, cluster mode, hot keys, thundering herds At least one production message broker (SQS, RabbitMQ, Kafka) — delivery semantics, idempotency, failure modes Engineering and leadership: Strong automation and IaC background — real code (Python, Go, or similar) and Terraform Track record leading cross‑team initiatives, writing design docs that hold up, influencing without authority Comfortable in a high‑growth environment where the right answer for 50 engineers isn't the right answer for 100 Pragmatic outlook during incidents — focused on preventing the next one Full‑Time US Employee Benefits Include Some of the nicest and smartest teammates you’ll ever work with Competitive salaries Comprehensive healthcare benefits Exciting and motivating equity Flexible PTO 401k Parental Leave Commuter Benefits (SF office employees) WFH Stipend Compensation $200k-$250k base + equity We consider several factors when determining compensation, including location, experience, and other job‑related factors. At Scribe, we celebrate our differences and are committed to creating a workplace where all employees feel supported and empowered to do their best work. We believe this benefits not only our employees but our product, customers, and community as well. Scribe is proud to be an Equal Opportunity Employer. #J-18808-Ljbffr scribehow.com

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff Database Reliability Engineer in San Francisco, CA vacancy

Senior Database Reliability Engineer
...who hold a high bar, move fast, and care deeply about each other and our customers. About the Role We’re hiring a Senior Database Reliability Engineer to own the reliability, performance, and scalability of Scribe’s data tier. Our engineering org is doubling — which...
Suggested
Full time
Work at office
Remote work
Home office
Flexible hours
3 days per week
scribehow.com
San Francisco, CA
5 days ago
Senior Database Reliability Engineer (DBRE)
$160k - $220k
...This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Senior Database Reliability Engineer (DBRE) Experience Level: Mid–Senior (4+ years PostgreSQL experience) About the Role We are looking for a highly...
Suggested
Permanent employment
Work at office
Local area
Worldwide
Flexible hours
Okta
San Francisco, CA
a month ago
Senior Data Reliability & Governance Engineer — Remote
$225k - $290k
P2P is hiring a senior leader to define and implement data reliability and quality standards. The role involves cross-team initiatives to enhance data observability and governance across the organization. The ideal candidate has extensive experience in scalable data platforms...
Suggested
Remote job
P2P
San Francisco, CA
3 days ago
Senior Offshore Mechanical Reliability Engineer
Hudson Manpower is seeking a Mechanical Engineer - Offshore Reliability for a role involving the improvement of offshore mechanical equipment reliability and performance. This position requires a Bachelor's Degree in Mechanical Engineering and a minimum of 12 years of experience...
Suggested
Hudson Manpower
San Francisco, CA
2 days ago
Senior Reliability Engineer
$150k - $180k
...’t it. The Role As we continue to develop and deploy cutting-edge autonomous technologies, we are seeking a Senior Reliability Engineer (REL) to lead efforts in ensuring the long-term performance, durability, and robustness of critical hardware systems. This role...
Suggested
Full time
Immediate start
Worldwide
Flexible hours
Night shift
Eight Sleep
San Francisco, CA
19 days ago
Senior Reliability Engineer - Rotating Equipment
$160k - $190k
Southern Recruiting Solutions, Inc. seeks a Sr. Reliability Engineer based in San Francisco, California. This role requires a Bachelor's in Mechanical Engineering and over 8 years of experience in a chemical plant or refinery. The successful candidate will conduct root...
Southern Recruiting Solutions, Inc.
San Francisco, CA
2 days ago
Sr. Reliability Engineer - rotating equipment
Responsibilities The Sr. Reliability Engineer will conduct root cause failure analysis (RCFA) to identify equipment breakdown causes and develop solutions to prevent recurrence. Perform reliability-centered maintenance (RCM) studies to identify critical equipment and...
Relocation
Southern Recruiting Solutions, Inc.
San Francisco, CA
2 days ago
Infra Reliability Engineer: Scale, Observability & Security
A leading AI research company in San Francisco is seeking a Software Engineer to enhance infrastructure supporting cutting-edge AI systems. The role involves designing reliable systems and optimizing performance for millions of users. Ideal candidates possess experience...
OpenAI
San Francisco, CA
3 days ago
Reliability Engineer: Scale Systems, Observe & Automate
A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations...
OpenAI
San Francisco, CA
2 days ago
LLM Reliability Engineer: Fuzz Testing & Production
$150k
A technology company in San Francisco seeks a Research Engineer to develop their reliability platform for LLM applications. The role focuses on optimization and testing methodologies while emphasizing hands-on implementation and collaboration with clients. Ideal candidates...
Enboarder
San Francisco, CA
1 day ago
Staff Reliability Engineer
$180k - $230k
...Job Description Job Description Job Title: Staff Reliability Engineer Location: Burlingame, CA Department: ESS Engineering Reports To: Staff Reliability Engineer Position Type: Full-time About Peak Energy Peak Energy is the first American...
Full time
Immediate start
Flexible hours
Peak Energy
San Francisco, CA
6 days ago
Senior DB Reliability Engineer (Remote)
scribehow.com is seeking a Senior Database Reliability Engineer based in San Francisco (hybrid model). You will own the reliability, performance, and scalability of our data tier and work with a growing engineering team. Your expertise will ensure smooth operations across...
Remote job
scribehow.com
San Francisco, CA
5 days ago
Senior Reliability & DFX Engineer for AI Accelerators
A leading AI research organization in San Francisco is seeking a cross-stack engineer to ensure reliability in next-generation AI systems. This hands-on position requires extensive experience in reliability modeling and DFX architecture to enhance the durability and performance...
OpenAI
San Francisco, CA
1 day ago
System Reliability Engineering
We’re looking for a Systems Reliability Engineer to own the reliability of our system across cloud, edge, and real-world environments . Our platform runs across distributed infrastructure—connecting cloud services, on-site compute, and live video/data pipelines inside...
Permanent employment
Claryo
San Francisco, CA
5 days ago
Hyperscale Compute Reliability Engineer
$175k - $300k
Fluidstack, located in San Francisco, is seeking a Production Engineer to ensure the health of their compute fleet. You will build metrics pipelines and automate repair workflows, defining what production-ready hardware means. The ideal candidate has strong hardware intuition...
Fluidstack
San Francisco, CA
5 days ago
Security Reliability Engineer
$293k - $385k
...About the Team The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and operating critical on prem and hybrid environments that power internal services and critical R&D environments. This is a new, bootstrap...
Work at office
OpenAI
San Francisco, CA
3 days ago
Founding Security Reliability Engineer
$150k - $250k
...As our Founding Security Reliability Engineer at Charta Health, you'll pioneer the application of Site Reliability Engineering principles to ensure the unwavering security, resilience, and operational excellence of our cutting-edge generative AI platform. This is...
Charta Health
San Francisco, CA
2 days ago
Reliability/DFX Engineer
...enable hardware optimized specifically for AI. About the Role We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. This hands-on individual contributor will sit within our hardware team and work closely with...
OpenAI
San Francisco, CA
1 day ago
Staff Data Architect
...Staff Data Architect At Komodo Health, our mission is to reduce... ...keeping these platforms reliable, scalable, and secure, the team... ...partner closely with analytics engineers, data scientists,... ...accomplished… Reduced database spend by at least 15% through...
Komodo Health
San Francisco, CA
3 days ago
Staff Data Architect
$190k - $250k
...to manage heart disease. As a Staff Data Architect, you will lead... ...for our transactional databases, analytical Data Lake, and the... ...partner deeply with Software Engineering, IT, Research, and Systems Engineering... ...workloads remain fast, reliable, and secure. Lead Enterprise...
Work experience placement
Local area
Worldwide
Relocation
HeartFlow
San Francisco, CA
4 days ago
Database Engineer Storage & Query Execution
...Postgres Database Internals Engineer ParadeDB is a transactional alternative to Elasticsearch built on Postgres. We build state-of-the-art full-text search and columnar analytics as a Postgres extension. Companies like Modern Treasury, BILT Rewards, Alibaba Cloud, many...
Full time
Work at office
ParadeDB
San Francisco, CA
3 days ago
Senior Database Engineer
...operation, and maintenance of enterprise database environments across Oracle and SQL... ...optimization strategies to improve efficiency and reliability. - Manage backup and recovery... ...leadership and oversight to junior database engineers, ensuring adherence to standards, best...
Minimum wage
Contract work
Temporary work
Work experience placement
Remote work
MAXIMUS
San Francisco, CA
5 days ago
Database Engineer
...planning, building, deployment, and maintenance of enterprise database environments across Oracle and SQL platforms. - Perform database... ...database-related issues. - Assist senior database engineers with performance tuning, optimization, and ongoing improvement...
Minimum wage
Contract work
Temporary work
Work experience placement
Remote work
MAXIMUS
San Francisco, CA
5 days ago
Founding Security Reliability Engineer - Equity & Growth
$150k - $250k
Madrona Venture Labs is seeking a Founding Security Reliability Engineer in San Francisco to design and maintain secure infrastructure for generative AI healthcare solutions. This pivotal role focuses on applying SRE principles to bolster security within a regulated environment...
Madrona Venture Labs
San Francisco, CA
3 days ago
Director, Reliability Engineering
...for talent across our geographies. Responsibilities Define reliability vision and roadmap, build and mentor a top-tier team, and embed... ...equivalent industry experience in electronics or reliability engineering. 10+ years of experience in reliability engineering for...
Worldwide
Reliabilityweb.com
San Francisco, CA
3 days ago
Senior Database Engine Engineer: Next-Gen Query Systems
A leading data and AI company in San Francisco seeks a Senior Software Engineer for Database Engine Internals. You will design and implement advanced query systems that outperform current technologies. Candidates should have experience in query optimization and a strong...
Databricks Inc.
San Francisco, CA
1 day ago
Staff DB Reliability Engineer — Equity & Flexible PTO
$200k - $250k
Scribehow.com is looking for a Staff Database Reliability Engineer to take charge of their data infrastructure strategy and architecture. In this role, you will design scalable access patterns, drive observability across tools like pganalyze and Honeycomb, and lead infrastructure...
Flexible hours
scribehow.com
San Francisco, CA
3 days ago
RL Infrastructure Reliability Engineer
$350k
Menlo Ventures is seeking a Research Engineer to enhance the reliability and infrastructure of AI systems focused on professional workflows. The ideal candidate will have substantial Python coding experience and a strong background in operating machine learning systems...
Work at office
Menlo Ventures
San Francisco, CA
2 days ago
Staff Research Associate III (Data Scientist)
$68.64k - $93.06k
Staff Research Associate III (Data Scientist) Job Category : Staff Research Assoc Requisition Number : STAFF001507 Apply now Posted... ...and Processing Collaborate with biomechanics researchers and engineers to collect and integrate large-scale biomechanical datasets....
Full time
Work at office
NCIRE
San Francisco, CA
3 days ago
Senior Staff Data Platform Architect
$233.5k - $350.5k
...GoFundMe team is searching for our next Senior Staff Data Platform Architect to help design and... ...experiences. You’ll partner closely with engineering, product, analytics, and data science to build systems that are reliable, efficient, and built for long‑term growth....
Full time
Work at office
Flexible hours
GoFundMe
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Database Reliability Engineer. Be the first to apply!