Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer, Infrastructure - Analytics Platform

OpenAI

About the Team The Scaling team designs, builds, and operates critical infrastructure that enables research at OpenAI. Our mission is simple: accelerate the progress of research towards AGI. We do this by building core systems that researchers rely on - ranging from low-level infrastructure components to research-facing custom applications. These systems must scale with the increasing complexity and size of our workloads, while remaining reliable and easy to use. About the Role We're looking for an experienced Site Reliability Engineer to own production-critical infrastructure end to end. This role is centered on data-heavy, low-latency workloads, with emphasis on operating large-scale ClickHouse clusters, high-throughput Kafka pipelines, and reliable integrations with Snowflake. You'll turn ambiguous operational problems into clear plans, ship pragmatic solutions quickly, and improve them through production feedback and iteration. We are specifically looking for someone who can independently define and raise operational standards across teams while remaining deeply hands‑on in production systems. In this role, you will Own infrastructure lifecycle management across provisioning, upgrades, scaling, and decommissioning (IaC‑first). Operate and scale ClickHouse clusters, including sharding, replication, capacity planning, performance tuning, and maintenance. Operate Kafka as the ingestion backbone, improving throughput, lag, backpressure handling, and failure recovery. Improve end‑to‑end latency and reliability for data‑heavy serving and query workloads. Build and maintain strong monitoring and alerting: SLIs/SLOs, dashboards, alert policies, and actionable runbooks. Define, implement, and continuously improve incident response standards, on‑call practices, and postmortem quality. Own backup/restore and disaster recovery strategy, including regular recovery drills. Plan and execute safe rollouts across multiple environments (dev/stage/prod), including canary and rollback strategies. Partner day to day with software engineers, embedding reliability into design, implementation, and release processes. Set the quality bar for operational readiness and runbook standards, and drive adoption across teams. Improve CI/CD pipelines and DevEx for faster, safer, and more predictable releases. Strengthen security posture across infrastructure and delivery systems (least privilege, secrets management, patching, supply‑chain controls). You might thrive in this role if you have A track record of owning production infrastructure for data‑heavy, low‑latency systems end to end. Strong hands‑on experience operating ClickHouse, Kafka, and adjacent large‑scale data systems. Practical experience with Snowflake workflows and cross‑system data architecture. The ability to independently define operational standards (runbooks, incident process, rollout safety) and make them stick. Strong operational experience with Kubernetes, Terraform, and cloud infrastructure. Excellent communication and collaboration skills; you work effectively across engineering and research teams. High personal rigor and organization in high‑pressure production environments. A deeply hands‑on mindset: willing to debug incidents, tune systems, and implement fixes directly. About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general‑purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Aff…Statement Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US‑based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non‑public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations. To notify OpenAI that you believe this job posting is non‑compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance. We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link. OpenAI Global Applicant Privacy Policy At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology. #J-18808-Ljbffr

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer, Infrastructure - Analytics Platform in San Francisco, CA vacancy
  •  ...builds, and operates critical infrastructure that enables research at...  ...workloads, while remaining reliable and easy to use. About the Role...  ...for a staff-level software engineer to own production-critical infrastructure...  ...or infrastructure for analytics, telemetry, logging, search,... 
    Platform
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    9 hours ago
  • $190k - $280k

     ...tools. About the role The Event Analytics Platform (EAP) team is responsible for the infrastructure that powers all of Sentry's...  ...generate. As a Senior Software Engineer, you will lead efforts to push...  ...data at world-class speed and reliability. Architect and automate services... 
    Platform
    Work at office
    Relocation

    Sentry

    San Francisco, CA
    1 day ago
  • A leading data and analytics firm is seeking a Full-Stack Engineer to build and evolve their Cube Cloud platform. You will work across frontend and backend, designing APIs and enhancing user experiences. Ideal candidates will have expertise in Node.js, REST/GraphQL APIs... 
    Platform
    Remote work
    Worldwide

    Cube Dev, Inc.

    San Francisco, CA
    4 days ago
  • An educational technology company is seeking a Staff Analytics Engineer to take ownership of analytics data management. You will enhance data...  ...in analytics engineering and a passion for building reliable data systems. The role allows for flexibility across locations... 
    Platform

    ClassDojo

    San Francisco, CA
    4 days ago
  •  ...A leading identity platform company in San Francisco is seeking an experienced Data Infrastructure Engineer. You will design and maintain data platforms, implement data models...  ..., and collaborate across teams to drive analytics innovation. Ideal candidates have 3+ years... 
    Platform

    Persona

    San Francisco, CA
    9 hours ago
  •  ...A leading government services provider in San Francisco is seeking a PostgreSQL Database Developer to support a data analytics platform. The successful candidate will manage the migration from Oracle to Postgres and ensure the scalability of our data systems. Ideal applicants... 
    Platform

    Contact Government Services, LLC

    San Francisco, CA
    9 hours ago
  •  ...AI company in San Francisco is looking for a Data Engineer to build and scale its internal data platform. The successful candidate will design data models,...  ...position requires experience in data engineering, analytics, and familiarity with tools like Apache Beam and Kafka... 
    Platform

    Baseten

    San Francisco, CA
    2 days ago
  •  ...A technology-driven company in San Francisco is looking for a senior engineer to develop an internal data agent system that enables easy access to analytics across teams. The ideal candidate will have experience in data modeling and pipelines, along with strong problem... 
    Platform
    Flexible hours

    Collective

    San Francisco, CA
    9 hours ago
  • Cursor is seeking an Analytics Platform Engineer to take ownership of data foundations and work on optimizing their data lakehouse infrastructure. In this role, you will collaborate with various teams to improve data reliability and performance. The ideal candidate will... 
    Platform

    Cursor

    San Francisco, CA
    3 days ago
  • A global payments technology leader is seeking a Hybrid Data Specialist to enhance Visa Marketing 360, focused on data engineering and analytics. You will build and optimize a scalable data foundation, integrate multiple data sources, and develop dashboards for marketing... 
    Platform

    Visa

    San Francisco, CA
    4 days ago
  •  ...Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco · Full-Time About Andromeda Andromeda Cluster was founded by Nat...  ...deliver compute when and where it’s needed most. Our platform routes training and inference jobs across global supply... 
    Platform
    Full time
    Remote work

    Andromeda Cluster

    San Francisco, CA
    9 hours ago
  • $220k - $331k

    Amplitude is seeking a Staff Fullstack Engineer for its Core Analytics team in San Francisco. The successful candidate will have over 7 years of experience in backend and frontend development, be involved in performance optimization and mentoring, and drive critical projects... 
    Platform

    Amplitude

    San Francisco, CA
    5 days ago
  •  ...A digital identity platform company in San Francisco is looking for a Data Infrastructure Engineer to design, build, and maintain their data platform. The role requires...  ...will collaborate with various teams to enhance analytics and core product features, identify transformative... 
    Platform

    Persona

    San Francisco, CA
    9 hours ago
  •  ...Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote / San Francisco • Full-Time About Andromeda Andromeda Cluster was founded...  ...deliver compute when and where it’s needed most. Our platform routes training and inference jobs across global supply... 
    Platform
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    9 hours ago
  • A data-driven tech company in San Francisco is seeking an Analytics Engineer to take ownership of their data stack and support various teams, including Marketing and Sales, with reliable metrics. The role entails building and maintaining dbt projects, implementing ETL... 
    Platform
    Flexible hours

    FullEnrich

    San Francisco, CA
    2 days ago
  • Aera Technology in Mountain View, California is seeking Application Developers to build data analytics models and processing code for cognitive applications. The role involves working with ETL processes, creating workflows in Java and Python, and developing reports. Candidates... 
    Platform
    Flexible hours

    Aera-Technology

    San Francisco, CA
    1 day ago
  • $125k - $195k

     ...small team of exceptional, hands-on engineers to make this happen. Mechanical, electrical...  .... About the role We are seeking an Infrastructure & Site Reliability Engineer to design, build, deploy,...  ..., VPNs Scale our observability platform: Build systems to ingest and display... 
    Platform
    Work at office
    Visa sponsorship
    Night shift

    Atomicsemi

    San Francisco, CA
    9 hours ago
  • $140k - $205k

     ...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site...  ...Reliability Engineer to join the Infrastructure & Development...  ...~ Deep expertise in cloud platforms, particularly AWS, and container...  ...Competencies : Expert analytical/quantitative, problem-... 
    Platform
    Full time
    Temporary work
    Work at office
    Flexible hours
    Weekend work

    Cooley

    San Francisco, CA
    3 days ago
  •  ...leading FinTech company is looking for an Engineering Manager to lead the Data Storage team....  ...the architecture that supports both analytics and product experiences. Ideal...  ...experience in software development, data infrastructure, and team management. This full-time position... 
    Platform
    Full time

    Menlo Ventures

    San Francisco, CA
    2 days ago
  • $268k - $368.5k

     ...technology wholesale platform built on the belief that...  ...this role Our Engineering organization owns the...  ..., Strategy, Analytics, Finance, etc. Enabling...  ...concern about underlying infrastructure. We enable product engineering...  ...operates a secure, reliable, cost-efficient,... 
    Platform
    Work experience placement
    Work at office
    Local area
    Remote work
    Monday to Friday
    Flexible hours
    3 days per week

    Faire Inc

    San Francisco, CA
    5 days ago
  •  ...inventive research, design, and engineering. Our organization is very...  ...As one of Cursor’s first Analytics Platform Engineers, you’ll own the...  ...company‑wide data work reliable, secure, and easy to build...  ...repeated data needs into durable infrastructure. Cursor is already... 
    Platform
    Full time
    Immediate start

    Cursor

    San Francisco, CA
    3 days ago
  •  ...Judgment Labs builds infrastructure for Agent Behavior Monitoring...  ..., and pinpoint where reliability breaks down. We've...  ...Data Infrastructure Engineer to build and scale...  ...ingestion through analytics. What You'll Do: Design...  ...processing, or event stream platforms (Datadog, Honeycomb,... 
    Platform

    Judgment Labs Inc.

    San Francisco, CA
    8 hours ago
  • A leading tech company in San Francisco is seeking an Azure Data Engineer to design, build, and maintain cloud-based data pipelines and analytics platforms using Azure services. The ideal candidate will have significant experience with Azure technologies, a strong background... 
    Platform

    TechDigital Group

    San Francisco, CA
    3 days ago
  •  ...Job Description Job Description Cloud Data Engineer – AI & Analytics Transformation Location: US / Canada (Remote/Hybrid) Type: Contract...  ...and implement modern data architectures using cloud platforms ~ Build and optimize ETL/ELT pipelines for large-scale... 
    Platform
    Full time
    Contract work
    Remote work

    NavitasPartners

    San Francisco, CA
    13 days ago
  •  ...A tech company specializing in HR and IT solutions is seeking a Senior Staff Front-End Engineer to lead the architecture of their analytics platform. This role involves overseeing the development of high-performance visualization systems that handle large datasets and... 
    Platform

    Rippling

    San Francisco, CA
    9 hours ago
  • $220k - $235k

     ...the future of our cloud platform and champion engineering excellence across Ironclad...  ...direction for the Site Reliability Engineering team and our...  ...Ability to build resilient infrastructure Modern GitOps - Experience...  ...Zed Troubleshooting and analytical skills, can PR review... 
    Platform
    Full time
    Work at office

    Jobr

    San Francisco, CA
    10 hours ago
  • $200k - $250k

    A leading technology firm is looking for a Director of Analytics Engineering in Salt Lake City to develop a world-class analytics platform and manage a high-performing team. The ideal candidate has 8+ years of experience, expert SQL knowledge, and familiarity with BI tools... 
    Platform
    Flexible hours

    Flex

    San Francisco, CA
    2 days ago
  • $138k - $179k

     ...day operation and availability of the platform for our global client base. To keep...  ...a wide variety of other teams from infrastructure and engineering, to QA and business teams, so strong...  ...language. Our research-based data, analytics and indexes, supported by advanced technology... 
    Platform
    Flexible hours

    MSCI

    San Francisco, CA
    9 hours ago
  •  ...Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with...  ...to build and maintain robust cloud infrastructure. In this role, you will lead critical...  ..., and possesses expertise in cloud platforms and incident response. This position... 
    Platform

    Airwallex-

    San Francisco, CA
    1 day ago
  • $180k - $210k

     ...time Location Type Remote Department Tech Engineering Compensation $180K – $210K • Offers...  ...and multimodal AI. About the Role As an Infrastructure Engineer at TwelveLabs, you will design...  ...infrastructure that allows our AI SaaS platform to operate stably and scale effectively... 
    Platform
    Full time
    H1b
    Work at office
    Remote work
    Worldwide
    Visa sponsorship
    Flexible hours

    Twelve Labs

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer, Infrastructure - Analytics Platform. Be the first to apply!