Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineering Tech Lead

Acryl Data

DataHub is an AI & Data Context Platform adopted by over 3,000 enterprises, including Apple, CVS Health, Netflix, and Visa. Innovated jointly with a thriving open-source community of 13,000+ members, DataHub's metadata graph provides in-depth context of AI and data assets with best-in-class scalability and extensibility.

The company's enterprise SaaS offering, DataHub Cloud, delivers a fully managed solution with AI-powered discovery, observability, and governance capabilities. Organizations rely on DataHub solutions to accelerate time-to-value from their data investments, ensure AI system reliability, and implement unified governance, enabling AI & data to work together and bring order to data chaos.

About the Role

We're seeking an experienced Site Reliability Engineering (SRE) Tech Lead to join DataHub and drive the reliability, scalability, and operational excellence of our platform offerings. In this role, you'll lead technical initiatives across DataHub Cloud and our emerging enterprise deployment solution, which provides customers with enhanced control and flexibility for running DataHub in their preferred environments.

Key Responsibilities

Technical Leadership & Architecture

  • Design and implement robust, scalable infrastructure solutions for DataHub Cloud and enterprise deployments
  • Lead the technical vision for multi-cloud deployment strategies and distributed system integrations
  • Architect monitoring, observability, and alerting systems across diverse environments
  • Drive best practices for infrastructure as code, configuration management, and deployment automation

Enterprise Platform Development

  • Partner with product and engineering teams to influence the development of advanced deployment capabilities
  • Collaborate with cross-functional teams to help build systems for seamless installation, upgrade, and rollback processes across various environments
  • Influence the design and help implement comprehensive monitoring and health check systems for distributed deployments
  • Partner with engineering teams to help develop self-healing and automated remediation capabilities

Platform Reliability & Operations

  • Establish and maintain SLAs/SLOs for both cloud and enterprise offerings
  • Lead incident response and post-mortem processes to drive continuous improvement
  • Implement chaos engineering practices to proactively identify system weaknesses
  • Optimize system performance, capacity planning, and cost efficiency

Team Leadership & Collaboration

  • Mentor and guide a team of SRE engineers and collaborate with platform engineering teams
  • Work closely with product, engineering, and customer success teams to ensure reliable product delivery
  • Improve on-call practices, runbooks, and knowledge sharing processes
  • Drive cross-functional initiatives to improve overall system reliability

Required Qualifications

  • 8+ years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles
  • 3+ years of technical leadership experience managing engineering teams
  • Strong expertise with cloud platforms (AWS, GCP, Azure) and infrastructure automation tools
  • Proficiency in containerization technologies (Docker, Kubernetes) and orchestration
  • Experience with infrastructure as code tools (Terraform, CloudFormation, Pulumi)
  • Strong programming skills in Python, Java, or similar languages
  • Deep understanding of monitoring and observability tools (Prometheus, Grafana, Datadog, etc.)
  • Experience with CI/CD pipelines and deployment automation
  • Strong knowledge of networking, security, and database operations in cloud environments

Preferred Qualifications

  • Experience building and operating multi-tenant SaaS platforms
  • Background in developing customer-facing deployment and management tools
  • Knowledge of data infrastructure and metadata management systems
  • Experience with service mesh technologies and microservices architectures
  • Previous experience in a customer-facing technical role or working with enterprise clients
  • Experience with data governance or data catalog platforms

What You'll Build

  • A robust management control plane for enterprise-grade deployments
  • Automated deployment pipelines supporting multiple cloud providers and customer environments
  • Comprehensive monitoring and alerting systems with customer-facing dashboards
  • Self-service tools for customers to manage their DataHub installations
  • Disaster recovery and backup solutions for enterprise deployments
  • Performance optimization and scaling solutions for high-volume data workloads

Benefits and Perks

We invest in people so they can do their best work and enjoy doing it. Our benefits reflect the way we build: practical, thoughtful, and designed to support long-term growth.

Competitive compensation

We offer salaries that reflect your skills, experience, and the impact you make. You bring value—we make sure you're recognized for it.

Equity for everyone

Every team member receives an ownership stake in the company. When we grow, you grow with us.

Hybrid work

This is a hybrid role with the expectation that you will work from our Palo Alto office a minimum of three days per week.

Location flexibility

Home office, coworking space, or something in between? We support your ideal setup. You’ll receive a monthly coworking stipend to use whenever you need a change of pace or in-person collaboration time.

Comprehensive health coverage

Your well-being matters. We cover 99% of medical, dental, and vision premiums employees, and 65% for dependents.

Flexible savings accounts

We offer FSAs to help cover planned or unexpected healthcare costs. You can also opt into a Dependent Care FSA to support family needs.

Support for every path to parenthood

Through Carrot Fertility, we provide inclusive fertility benefits and family-forming support. All U.S. employees have access, regardless of age, gender identity, or family structure.

Time off that works for you

We trust you to take the time you need. Our unlimited PTO and sick leave policy is designed for flexibility, rest, and real life.

Why Join Us

DataHub is at a rare inflection point: we’ve achieved product-market fit, earned the trust of leading enterprises, and secured backing from top-tier investors like Bessemer Venture Partners and 8VC. The context platform market is expected to grow from $1B to $9B in the next five years—and we’re leading the way.

By joining our team, you’ll:

  • Tackle high-impact challenges at the heart of enterprise AI infrastructure
  • Ship production systems that power real-world use cases at global scale
  • Collaborate with a high-caliber team of builders who’ve scaled some of the most influential data tools in the world
  • Build the next generation of AI-native data systems, including conversational agents, intelligent classification, automated governance, and more

If you're passionate about technology, enjoy working with customers, and want to be part of a fast-growing company changing the industry, we want to hear from you!

Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineering Tech Lead in Palo Alto, CA vacancy
  •  ...Full time Location Type On-site Department Engineering, Product, Design Why Zania...  ...an impressive portfolio of leading customers. Tier 1 Backing:...  ...standards for security and reliability. Cross-Functional...  ...Capability: Deep proficiency in our tech stack (e.g., Python,... 
    Suggested
    Full time
    Work at office
    Relocation
    Relocation package
    Flexible hours

    Menlo Ventures

    Palo Alto, CA
    1 day ago
  • About the Role As an Electrical Engineering Manager in our Hardware Engineering team you will be responsible for all electronic components...  ...days a week at our Mountain View, CA office. What You Will Do Lead electronic components selection required for an Autonomous... 
    Suggested
    Work at office

    Booster

    Mountain View, CA
    15 hours ago
  •  ..., and the challenges of building in a high-growth startup, we’d love to talk. This is more than a job—it’s a journey. Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. The team owns... 
    Suggested
    Remote work

    ASAPP

    Mountain View, CA
    29 days ago
  • $180k

     ...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who...  ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform... 
    Suggested
    Permanent employment
    Temporary work
    Relocation

    xAI

    Palo Alto, CA
    a month ago
  •  ...Job Description Job Description Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP or AWS, on-prem). Build, maintain, and optimize Kubernetes... 
    Suggested

    Amiri Recruiting

    Mountain View, CA
    24 days ago
  • $100k - $200k

     ...OPPO US Research Center is seeking a skilled and proactive Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the stability, scalability, and performance of our application systems. The ideal candidate is passionate about... 
    Full time

    OPPO

    Palo Alto, CA
    3 days ago
  • $217.57k - $260k

     ...more, visit Role Overview The Staff Site Reliability Engineer, Infrastructure role is building a...  ...experience operating at this scale and leading infrastructure through significant...  ...cohesive, scalable platform within a massive tech stack. Design observability,... 
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours
    Shift work

    ID.me

    Mountain View, CA
    8 days ago
  • $220k - $320k

     ...Job Description Job Description About the role Own the infrastructure that engineering depends on — Kubernetes clusters, CI/CD pipelines, on-prem ↔ cloud sync, observability, and high-availability platforms for chip-design and ML workloads. Work with chip-design and... 
    H1b
    Visa sponsorship
    Work visa

    DensityAI

    Mountain View, CA
    10 days ago
  • $232k - $263k

     ...Join us as we define the future of SaaS security! Sr. Staff Site Reliability Engineer As a Sr. Staff SRE at Obsidian , you will define and...  ...a hands-on technical role that involves architecting and leading the implementation of systems that handle real-world... 
    Work from home
    Flexible hours

    Obsidian Security

    Palo Alto, CA
    7 days ago
  • $169k - $224k

     ...disciplinary organization of scientists, engineers, and physicians and we are using the...  ...the United Kingdom. It is supported by leading global investors and pharmaceutical, technology...  ...grail.com GRAIL is seeking a Staff Site Reliability / DevOps Engineer to lead the... 
    Full time
    Work at office
    Local area
    Flexible hours
    Shift work

    GRAIL

    Menlo Park, CA
    22 days ago
  • $168.93k - $192.5k

     ...identity. To learn more, visit Role Overview We are seeking a Site Reliability Engineer to join our Core Platform Engineering organization. The SRE...  ...intervention. Participate in on-call rotations and lead incident response efforts, performing post-incident reviews... 
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours

    ID.me

    Mountain View, CA
    8 days ago
  • $180k - $260k

     ...Stanford researchers and veteran systems engineers who share a vision for redefining the...  ...struggles to meet the demands of performance, reliability, and precise coordination. Clockwork is...  ...Role We are seeking an experienced Tech Lead to lead the architecture, development,... 

    Clockwork.io

    Palo Alto, CA
    a month ago
  • $180k - $260k

     ...Stanford researchers and veteran systems engineers who share a vision for redefining the...  ...struggles to meet the demands of performance, reliability, and precise coordination. Clockwork is...  ...for a passionate and experienced Tech Lead - Frontend / Full Stack to join our... 

    Clockwork.io

    Palo Alto, CA
    21 days ago
  •  ...CoverPin is looking for a founding lead to set the technical direction for its customer-facing product. You will own IC delivery, define frontend patterns, and manage the agent configuration layer. The ideal candidate has over 6 years of production React experience... 

    CoverPin

    Mountain View, CA
    3 days ago
  • Detailed JD: == Need excellent bug reports and log reading knowledge. Good communication(have to deal with Eng managers/TPM's etc) Detailed initial bug triage for multiple teams within Telephony, Switching, Voice, Messaging ¿ Collaborating with Android Telephony and Pixel...

    Procyon TS

    Mountain View, CA
    1 day ago
  •  ...for an experienced Technical Lead with specialized skills in Computer...  ..., thereby enabling safe and reliable behavior downstream...  ...background, hands-on software engineering experience, and a knack for technically...  ...path to profitable AVs Tech Brew: Gatik AI exec unpacks... 
    Odd job

    Gatik AI

    Mountain View, CA
    2 days ago
  •  ...A leading autonomous driving technology company in Mountain View is seeking a Senior Tech Lead to drive technical projects in simulation. This role requires over 10 years of software engineering experience and strong technical leadership in C++ or Python. The ideal candidate... 

    Waymo

    Mountain View, CA
    3 days ago
  • The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and operational...  ...and on‑call practices to support them Lead incident response and blameless...  ...scaling company Familiarity with our tech stack: AWS, Pulumi, Postgres, ClickHouse... 

    Nectar

    Palo Alto, CA
    4 days ago
  • $210k - $270k

     ...to give power to the patient. To do that, we’ve built the leading healthcare marketplace that makes it easy to find and book...  ...Your Impact on our Mission: Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production... 
    Flexible hours

    Dormont Manufacturing Co

    Palo Alto, CA
    2 days ago
  •  ...Tech Lead, AI Compute Infrastructure Los Angeles, Palo Alto, San Francisco, Toronto, Singapore...  ...You will be the core engineer responsible for building the robust, efficient...  ...AI Job Framework: Build highly scalable, reliable frameworks for launching and managing massive... 
    Full time

    HeyGen

    Palo Alto, CA
    1 day ago
  • $210k - $270k

    Zocdoc is seeking a Senior Site Reliability Engineer to develop and maintain distributed production systems. The ideal candidate will have over 5 years of experience in site reliability or production engineering, particularly in cloud environments like AWS. Responsibilities... 

    GoTo Meeting

    Palo Alto, CA
    4 days ago
  •  ...technologies. Our mission is to double America’s compute capacity without building new data centers. We are seeking a skilled Site Reliability Engineer to join our growing team. The ideal candidate will help ensure the reliability, scalability, and performance of our hybrid... 
    Work at office
    Weekend work

    FLUIX

    Palo Alto, CA
    4 days ago
  • $86.33k - $191.9k

     ...guardrails to make going fast also going safely. Identifying reliability anti-patterns and solving them systemically . You dive deep into...  ...of AI‑assisted developer tools and platforms to increase engineering productivity, enforce code quality standards, and enable real... 
    Local area
    Flexible hours

    Traveltechessentialist

    Palo Alto, CA
    2 days ago
  •  ...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native... 
    Remote job

    BuildBuddy

    Palo Alto, CA
    4 days ago
  • $180k - $360k

     ...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who...  ...Cybersecurity / SRE team is focused on ensuring the security and reliability of X Money. This role will primarily focus on the X Money platform... 
    Temporary work
    Relocation

    Pantera Capital

    Palo Alto, CA
    2 days ago
  • $140k - $220k

    About the Job You’ll own reliability and operational excellence for Pylon’s production systems. This means designing and implementing...  ...scale as we grow. You’ll build tooling that makes the entire engineering team more effective, establish on‑call rotations and runbooks... 

    Pylon

    Palo Alto, CA
    1 day ago
  • Salesforce Developer/Administrator 10+ years of experience in Salesforce development/admin Experience in Sales and Service cloud Experience with CI/CD Devops including using tools such as: Visual Studio Code, Git, Bitbucket and Jira Ability to troubleshoot the...

    Vantage Point Consulting Inc.

    Palo Alto, CA
    4 days ago
  •  ...A tech company specializing in self-driving technology is seeking a Tech Lead - Backend Software Engineer in Mountain View, California. This role involves architecting scalable cloud services and mentoring senior engineers. The ideal candidate has extensive industry experience... 

    Australian Competition and Consumer Commission

    Mountain View, CA
    3 days ago
  •  ...A technology company in Mountain View is looking for a Tech Lead- Backend Software Engineer to manage backend services and architecture for self-driving technology. The ideal candidate has over 8 years of experience in software development, strong cloud backend skills... 
    Full time

    Aurora CO

    Mountain View, CA
    3 days ago
  • $220k - $280k

     ...Schedule Full-Time | Hybrid (Bay Area on-site + remote flex) Compensation $220,0...  ...+ 5% equity , this role is for the engineer who is ready to lead without stepping back from the work....  ...+ years of engineering management or tech lead experience with direct reports,... 
    Full time
    Remote work
    Flexible hours

    PLACEM!NT by TZ Consulting

    Palo Alto, CA
    8 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineering Tech Lead. Be the first to apply!