Staff Database Reliability Engineer
$200k - $250kscribehow.com
About the role We're hiring a Staff Database Reliability Engineer to own the strategy, architecture, and operational excellence of our data infrastructure. This is an expert-level IC role with deep influence on engineering direction, partnering closely with platform, backend, and DevOps engineers. Why this role matters You will own the data tier end-to-end. Design schemas and access patterns that scale, tune Aurora for latency and throughput, and set the standards for how engineers interact with our databases. When a migration script seizes up mid-deploy and writes start queueing behind an ACCESS EXCLUSIVE lock, your runbooks and automation resolve the incident quickly. Make the Django ORM a strength, not a liability: Review migrations for safety at scale — locks, backfills, concurrent index builds, NOT VALID constraints Catch N+1 patterns and missing select_related/prefetch_related in review Establish conventions for QuerySet usage and physical schema design (indexes, constraints, partitioning) Scale review through automation, not heroics — author AGENTS.md files and DNA scaffolding that encode our conventions, configure AI review bots (Claude Code, Cursor, etc.) to flag risky migrations and ORM anti-patterns, and iterate on those configs as new failure modes emerge Lead major infrastructure initiatives: Capacity planning as traffic and engineering throughput grow Zero-downtime schema migrations and cutovers Multi-AZ resilience within a single region — Aurora writer/reader placement, failover behavior and RTO/RPO, ElastiCache and OpenSearch AZ topology, RabbitMQ survivability across AZs Backups, PITR, failover testing, retention Own the CDC pipeline (Aurora → DMS → S3 Parquet → Snowflake): DMS task design and tuning, replication slot hygiene on the Postgres side Schema evolution as Django migrations roll through — so a column rename doesn't silently break the warehouse at 6 AM Parquet layout and partitioning, reliability of the Snowflake handoff Automated checks that flag migrations likely to break downstream consumers Drive observability across three complementary tools: pganalyze — query‑level performance, index advisor, schema insights - the go‑to for 'why is this ORM query slow' CloudWatch — infrastructure metrics and alarms for Aurora, OpenSearch, ElastiCache, SQS, DMS Honeycomb — high‑cardinality tracing that ties slow DB calls back to users, flags, deploys, and flows Shape how the three fit together, including Django‑side instrumentation and trace attributes on ORM queries Build tooling and guardrails: Migration review automation and CI checks for risky patterns Slow query pipelines fed from pganalyze Self‑service dashboards so teams understand their own query footprint > Support and evolve the rest of the stack: OpenSearch — index design, sharding, mapping changes, reindexing strategy, Django‑side indexing pipelines Redis — caching patterns, eviction, sizing, Django cache framework, Celery/RQ usage, avoiding hot keys and thundering herds SQS + RabbitMQ — queue design, DLQs, visibility timeouts, exchange/queue topology, AZ mirroring, consumer backpressure, Celery behavior under load What makes you a great fit Core expertise: Deep PostgreSQL — EXPLAIN (ANALYZE, BUFFERS), MVCC, bloat, lock contention, vacuum/autovacuum. Aurora Serverless V2 / Limitless experience strongly preferred (storage model, reader/writer split, ACU scaling) Strong ORM fluency (Django, SQLAlchemy, ActiveRecord, or similar) — predict the SQL a query will generate, spot N+1 problems on sight and how to control eager loading (joins vs. batched IN queries), column projection, aggregations, and subqueries Single-region multi‑AZ design — practical understanding of what it does and doesn't protect against Data movement and observability: Production CDC experience, ideally AWS DMS — comfortable with logical replication, slot hygiene, schema evolution, and Parquet‑based data lakes feeding Snowflake (or BigQuery/Redshift) Hands‑on with pganalyze (or Datadog DBM / Performance Insights / pg_stat_statements pipelines), CloudWatch (custom metrics, composite alarms, log insights), and Honeycomb (or another high‑cardinality tracing tool) — comfortable with OpenTelemetry and opinionated about what makes a trace useful AI‑assisted workflow: Real experience making AI coding and review tools useful for a team — writing AGENTS.md files, configuring review agents, versioning and iterating on prompts and configs The rest of the stack: OpenSearch at scale — sizing, sharding, JVM tuning, rolling upgrades, snapshots Production Redis — persistence tradeoffs, cluster mode, hot keys, thundering herds At least one production message broker (SQS, RabbitMQ, Kafka) — delivery semantics, idempotency, failure modes Engineering and leadership: Strong automation and IaC background — real code (Python, Go, or similar) and Terraform Track record leading cross‑team initiatives, writing design docs that hold up, influencing without authority Comfortable in a high‑growth environment where the right answer for 50 engineers isn't the right answer for 100 Pragmatic outlook during incidents — focused on preventing the next one Full‑Time US Employee Benefits Include Some of the nicest and smartest teammates you’ll ever work with Competitive salaries Comprehensive healthcare benefits Exciting and motivating equity Flexible PTO 401k Parental Leave Commuter Benefits (SF office employees) WFH Stipend Compensation $200k-$250k base + equity We consider several factors when determining compensation, including location, experience, and other job‑related factors. At Scribe, we celebrate our differences and are committed to creating a workplace where all employees feel supported and empowered to do their best work. We believe this benefits not only our employees but our product, customers, and community as well. Scribe is proud to be an Equal Opportunity Employer. #J-18808-Ljbffr scribehow.com
- ...who hold a high bar, move fast, and care deeply about each other and our customers. About the Role We’re hiring a Senior Database Reliability Engineer to own the reliability, performance, and scalability of Scribe’s data tier. Our engineering org is doubling — which...SuggestedFull timeWork at officeRemote workHome officeFlexible hours3 days per week
$160k - $220k
...This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Senior Database Reliability Engineer (DBRE) Experience Level: Mid–Senior (4+ years PostgreSQL experience) About the Role We are looking for a highly...SuggestedPermanent employmentWork at officeLocal areaWorldwideFlexible hours- Hudson Manpower is seeking a Mechanical Engineer - Offshore Reliability for a role involving the improvement of offshore mechanical equipment reliability and performance. This position requires a Bachelor's Degree in Mechanical Engineering and a minimum of 12 years of experience...Suggested
$225k - $290k
P2P is hiring a senior leader to define and implement data reliability and quality standards. The role involves cross-team initiatives to enhance data observability and governance across the organization. The ideal candidate has extensive experience in scalable data platforms...SuggestedRemote job$133.58k - $224.5k
...build for the long term. About the role: Samsara’s Hardware Reliability team enables an exceptional customer experience by enabling... ...to resolve key issues. Samsara’s Senior Hardware Reliability Engineer will design quality processes that guarantee the high-quality...SuggestedFull timeWork at officeRemote workFlexible hours$150k - $180k
...’t it. The Role As we continue to develop and deploy cutting-edge autonomous technologies, we are seeking a Senior Reliability Engineer (REL) to lead efforts in ensuring the long-term performance, durability, and robustness of critical hardware systems. This role...Full timeImmediate startWorldwideFlexible hoursNight shift$160k - $190k
Southern Recruiting Solutions, Inc. seeks a Sr. Reliability Engineer based in San Francisco, California. This role requires a Bachelor's in Mechanical Engineering and over 8 years of experience in a chemical plant or refinery. The successful candidate will conduct root...- Responsibilities The Sr. Reliability Engineer will conduct root cause failure analysis (RCFA) to identify equipment breakdown causes and develop solutions to prevent recurrence. Perform reliability-centered maintenance (RCM) studies to identify critical equipment and...Relocation
- A leading AI research company in San Francisco is seeking a Software Engineer to enhance infrastructure supporting cutting-edge AI systems. The role involves designing reliable systems and optimizing performance for millions of users. Ideal candidates possess experience...
- A leading AI research company based in San Francisco is seeking experienced reliability engineers to scale their infrastructure and ensure system performance and reliability. This role involves collaborating with diverse teams to develop resilient systems and enhance operations...
$150k
A technology company in San Francisco seeks a Research Engineer to develop their reliability platform for LLM applications. The role focuses on optimization and testing methodologies while emphasizing hands-on implementation and collaboration with clients. Ideal candidates...- scribehow.com is seeking a Senior Database Reliability Engineer based in San Francisco (hybrid model). You will own the reliability, performance, and scalability of our data tier and work with a growing engineering team. Your expertise will ensure smooth operations across...Remote job
$180k - $230k
...Job Description Job Description Job Title: Staff Reliability Engineer Location: Burlingame, CA Department: ESS Engineering Reports To: Staff Reliability Engineer Position Type: Full-time About Peak Energy Peak Energy is the first American...Full timeImmediate startFlexible hours$175k - $300k
Fluidstack, located in San Francisco, is seeking a Production Engineer to ensure the health of their compute fleet. You will build metrics pipelines and automate repair workflows, defining what production-ready hardware means. The ideal candidate has strong hardware intuition...- We’re looking for a Systems Reliability Engineer to own the reliability of our system across cloud, edge, and real-world environments . Our platform runs across distributed infrastructure—connecting cloud services, on-site compute, and live video/data pipelines inside...Permanent employment
- A leading AI research organization in San Francisco is seeking a cross-stack engineer to ensure reliability in next-generation AI systems. This hands-on position requires extensive experience in reliability modeling and DFX architecture to enhance the durability and performance...
- ...enable hardware optimized specifically for AI. About the Role We are seeking a highly skilled cross-stack engineer with deep expertise in making ML systems reliable at scale. This hands-on individual contributor will sit within our hardware team and work closely with...
$293k - $385k
...About the Team The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and operating critical on prem and hybrid environments that power internal services and critical R&D environments. This is a new, bootstrap...Work at office$150k - $250k
...As our Founding Security Reliability Engineer at Charta Health, you'll pioneer the application of Site Reliability Engineering principles to ensure the unwavering security, resilience, and operational excellence of our cutting-edge generative AI platform. This is...- ...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow...Work at officeRemote workFlexible hours2 days per week
- ...millions of daily users while enabling our engineering teams to ship fast. You'll own the... ...building automation and tooling that improves reliability and partnering with engineering to... ...including compute, networking, databases, and managed services What you'll bring...Work at officeWork from home
- ...Connor was a machine learning research engineer at Scale AI. The rest of our team comes... ...Senior SRE, you'll tackle the scaling and reliability challenges that come with adding... ...You'll work across the stack—optimizing databases, hardening services, and building the automation...
- ...Staff Data Architect At Komodo Health, our mission is to reduce... ...keeping these platforms reliable, scalable, and secure, the team... ...partner closely with analytics engineers, data scientists,... ...accomplished… Reduced database spend by at least 15% through...
$190k - $250k
...to manage heart disease. As a Staff Data Architect, you will lead... ...for our transactional databases, analytical Data Lake, and the... ...partner deeply with Software Engineering, IT, Research, and Systems Engineering... ...workloads remain fast, reliable, and secure. Lead Enterprise...Work experience placementLocal areaWorldwideRelocation- ...Postgres Database Internals Engineer ParadeDB is a transactional alternative to Elasticsearch built on Postgres. We build state-of-the-art full-text search and columnar analytics as a Postgres extension. Companies like Modern Treasury, BILT Rewards, Alibaba Cloud, many...Full timeWork at office
$125.5k - $230.2k
...career wherever you want it to go. Join EY and help to build a better working world. Technology – Data and Decision Science – Data Engineering – Manager We are looking for a dynamic and experienced Manager of Data Engineering to lead our team in designing and...Summer holidayFlexible hours- ...operation, and maintenance of enterprise database environments across Oracle and SQL... ...optimization strategies to improve efficiency and reliability. - Manage backup and recovery... ...leadership and oversight to junior database engineers, ensuring adherence to standards, best...Minimum wageContract workTemporary workWork experience placementRemote work
- ...planning, building, deployment, and maintenance of enterprise database environments across Oracle and SQL platforms. - Perform database... ...database-related issues. - Assist senior database engineers with performance tuning, optimization, and ongoing improvement...Minimum wageContract workTemporary workWork experience placementRemote work
$150k - $250k
Madrona Venture Labs is seeking a Founding Security Reliability Engineer in San Francisco to design and maintain secure infrastructure for generative AI healthcare solutions. This pivotal role focuses on applying SRE principles to bolster security within a regulated environment...- ...for talent across our geographies. Responsibilities Define reliability vision and roadmap, build and mentor a top-tier team, and embed... ...equivalent industry experience in electronics or reliability engineering. 10+ years of experience in reliability engineering for...Worldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Database Reliability Engineer. Be the first to apply!
- staff automation engineer San Francisco, CA
- staff data engineer San Francisco, CA
- research assistant engineering San Francisco, CA
- assistant engineer San Francisco, CA
- staff engineer San Francisco, CA
- assistant mechanical engineer San Francisco, CA
- software engineer staff San Francisco, CA
- assistant engineering manager San Francisco, CA
- senior staff systems engineer San Francisco, CA
- assistant civil engineer San Francisco, CA



