Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Head of Cloud Platform & SRE — Multi-Cloud & Observability

Baseten

Baseten in San Francisco is looking for a Senior Manager of Cloud Platform and Site Reliability to lead and grow the organization responsible for their machine learning platform infrastructure. The role requires managing team leads, setting technical direction, and ensuring the reliability of cloud operations. Ideal candidates have strong technical expertise in Kubernetes, cloud infrastructure, and proven incident management skills. They will contribute to establishing standards for service reliability and drive cross-functional collaboration with product and engineering teams. #J-18808-Ljbffr Baseten

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Head of Cloud Platform & SRE — Multi-Cloud & Observability in San Francisco, CA vacancy
  •  ...seeking a Director of Site Reliability Engineering to lead a dynamic SRE team. This senior role involves shaping engineering culture while...  ...build. This position requires a breadth of experience in SRE, cloud technologies, as well as strong leadership and communication... 
    Platform

    P2P

    San Francisco, CA
    1 day ago
  •  ...Overview: Job Title: Observability Architect (Dynatrace) Location...  ..., infrastructure, cloud platforms, and containerized environments...  ...coverage across hybrid and multi-cloud environments. Cloud...  ...Partner with DevOps, SRE, IT, and business teams to align... 
    Platform
    Contract work

    Purple Drive

    San Francisco, CA
    4 days ago
  • Neara is seeking a Sr. Site Reliability Engineer to design and operate the multi-cloud infrastructure powering Optura’s AI Platform. You will own systems end-to-end and partner with teams to ensure secure, scalable services. With at least 8 years in production environments... 
    Platform

    Neara

    San Francisco, CA
    4 days ago
  • $170k - $250k

     ...Site Reliability Engineer (SRE) Location: San Francisco, CA...  ...building a next-generation GPU cloud platform for enterprises, startups, and...  ...engineering to build the automation, observability, and platform infrastructure that powers their multi-cloud GPU marketplace at scale... 
    Platform
    Work at office
    Visa sponsorship
    Flexible hours

    Recruiting from Scratch

    San Francisco, CA
    21 hours ago
  • THE ROLE As Senior Manager of Cloud Platform and Site Reliability, you...  ...our cloud infrastructure and SRE practice — from coaching your...  ...escalations, to shaping the multi-year roadmap for multi-cloud...  ...infrastructure, and observability platforms. You operate at the... 
    Platform
    Temporary work
    Flexible hours

    Baseten

    San Francisco, CA
    9 days ago
  • Devops Engg /SRE Jobs in SRP Systems Inc San Francisco...  ...focus on improving platform reliability, availability...  ...of distributed (multi‑tiered) systems, algorithms...  ...Kafka. Preferred Skills Cloud platforms (AWS, Azure,...  ...etc.). Monitoring and observability tools such as Dynatrace... 
    Platform

    Sulekha.com New Media Pvt Ltd

    San Francisco, CA
    1 day ago
  •  ...Join us and help build the platform engineers turn to to ship AI...  ...processes, automations, and observability tooling that keep our platform...  ...like these as part of the SRE team: Improve Baseten...  ...the reliability of Baseten's multi-cloud Kubernetes infrastructure, including... 
    Platform
    Flexible hours

    BaseTen

    San Francisco, CA
    3 days ago
  • $172.5k - $260.1k

     ...Manager, Software Engineering - Cloud Platform Location New York, NY; San...  ...on as an afterthought. SRE Mindset: Engineering for failure...  ...999% availability standard. Observability: Relying on telemetry,...  ...zone. Deep understanding of multi-account cloud strategies, centralized... 
    Platform
    Work experience placement
    Shift work

    Centaur Labs

    San Francisco, CA
    4 days ago
  • $305k - $385k

     ...to build beneficial AI systems. About the role The Anthropic Platform Org’s mission is to help builders build. Our vision is to be the...  ...is responsible for our APIs, self-serve developer experience, multi-cloud integrations, and agentic infrastructure. We serve a wide... 
    Platform
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    3 days ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto / San Francisco...  ...Mithril is an AI infrastructure platform built to make GPU compute...  ...across a heterogeneous, multi-cloud environment. About the Opportunity...  ...will build the automation, observability, and tooling that allows... 
    Platform
    Work at office
    Local area
    1 day per week

    Mithril

    San Francisco, CA
    3 days ago
  •  ...DESCRIPTION The Senior DevOps & SRE Manager - Platform Reliability & Global...  ...operational excellence of a complex, multi‑platform ecosystem spanning...  ...as Code, and automation Observability and incident...  ...experience with Kubernetes, cloud platforms, and event‑driven... 
    Platform
    Work at office
    3 days per week

    Qcells North America

    San Francisco, CA
    2 days ago
  •  ...us and help build the platform engineers turn to to ship...  ...Manager for Baseten's Cloud Platform team, you will...  ...platform engineering, or SRE context (not managing...  ...HAVE Experience with OSS observability tooling (Prometheus,...  ...OpenTelemetry). Background in multi‑cloud environments or... 
    Platform
    Flexible hours

    The Consensus

    San Francisco, CA
    1 day ago
  •  ...transformation initiatives by building resilient multi-cloud infrastructure, automating deployments at scale, and driving platform reliability for enterprise SaaS products....  ...) Background in distributed systems observability with OpenTelemetry About APPIT Software Solutions... 
    Platform
    Flexible hours

    Appit LLC

    San Francisco, CA
    4 days ago
  •  ...Missionforce Operations (Private Cloud Edition)Skip to main...  ...processes on a single platform.We are seeking a senior...  ...infrastructure, DevOps/SRE, customer success,...  ..., including multi-tenancy, identity and access...  ...isolation, APIs, integration, observability, release management,... 
    Platform
    For contractors
    Work at office
    Remote work
    Shift work

    Salesforce, Inc.

    San Francisco, CA
    4 days ago
  • $148.5k - $223.9k

     ...Reliability Engineer (Cloud Automation) Location:...  ...About the Team The Cloud Platform Engineering team builds...  ...on as an afterthought. SRE Mindset: Engineering...  ...availability standard. Observability: Relying on telemetry,...  ...of new, fully governed multi-account cloud environments... 
    Platform
    Work experience placement
    Shift work

    Centaur Labs

    San Francisco, CA
    4 days ago
  •  ...passionate about building and operating production-grade systems. This role involves ownership of AWS infrastructure, Kubernetes platforms, and continuous improvement efforts in a high-pressure environment. The ideal candidate has deep AWS expertise, strong coding skills... 
    Platform

    Socure

    San Francisco, CA
    1 day ago
  •  ...emphasizes collaboration across teams and requires expertise in cloud systems, CI/CD practices, and reliability metrics. Candidates...  ...automation and configuration management, experience with cloud platforms like AWS, and a strong ability to work in a distributed environment... 
    Platform
    Remote job

    Wikimedia Foundation

    San Francisco, CA
    3 days ago
  • $140k - $185k

     ...the production environment: Strengthen observability: Reduce operational toil: Support...  ...What we’re looking for 3-6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles...  ...under pressure. Experience operating cloud infrastructure (AWS preferred). Working... 
    Platform
    Work at office
    Worldwide

    Dormont Manufacturing Co

    San Francisco, CA
    4 days ago
  • $166k - $225k

     ...leading data and AI company in San Francisco seeks a Senior Software Engineer to enhance their infrastructure platform. This role requires building multi-cloud systems and scalable solutions for managing data and AI workloads. Ideal candidates have a strong programming... 
    Platform
    Flexible hours

    Databricks

    San Francisco, CA
    2 days ago
  •  ...about this opportunity, feel free to reach out and apply today! Responsibilities Architect and implement a secure, scalable cloud platform meeting FedRAMP High and DoD IL5 standards. Oversee the integration of physical infrastructure with cloud orchestration,... 
    Platform
    Remote work

    Hamilton Barnes Associates Limited

    San Francisco, CA
    4 days ago
  • $300k

     ...mode startup building out their AI and cloud platform, powered by thousands of H100s, H200s,...  ...and Kubernetes environments. Develop observability, alerting, and auto-healing systems for...  ...Must Have: 7+ years of experience in SRE, DevOps, or Infrastructure Engineering... 
    Platform

    Hamilton Barnes Associates Limited

    San Francisco, CA
    1 day ago
  •  ...will work at the intersection of cloud infrastructure, Kubernetes, automation, and observability, with a strong focus on...  ...improvement of Kubernetes (EKS) platforms Reliability of production systems...  ...Experience troubleshooting complex, multi-layer Kubernetes issues... 
    Platform
    Remote work

    Socure Inc

    San Francisco, CA
    4 days ago
  •  ...Web site Reliability Engineer (SRE) CloudDevs works with fast-moving...  ...system reliability, efficiency, and observability. Outline and monitor SLIs, SLOs...  ...5+ years in SRE, DevOps, or Platform Engineering roles. Sturdy expertise with cloud infrastructure (AWS most popular... 
    Platform

    The10minutecareersolution

    San Francisco, CA
    1 day ago
  •  ...the enterprise sustainability platform. Companies like Airbnb,...  ...Engineering Manager for our Cloud Infrastructure team to help lead...  ...infrastructure that powers Watershed’s multi-region deployment on GCP,...  ...database architecture, observability, reliability & SLOs, cloud security... 
    Platform
    Work at office
    Remote work

    Watershed

    San Francisco, CA
    1 day ago
  • Hyperbolic Labs, based in San Francisco, seeks a Platform Engineer to design the control plane for our innovative GPU marketplace. This...  ...for implementing identity management, billing systems, and multi-cloud abstractions that enhance developer experiences. An expert in... 
    Platform

    Hyperbolic Labs

    San Francisco, CA
    4 days ago
  • B Capital is seeking a Systems Engineer to join its Compute Platform team in San Francisco. This role involves maintaining a K8s-based...  ...systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate has extensive experience in... 
    Platform

    B Capital

    San Francisco, CA
    2 days ago
  • Zyphra in San Francisco is hiring a Platform Engineer responsible for designing and maintaining robust infrastructure. You will collaborate with teams to enhance system observability, manage cloud environments and ensure deployment safety. The ideal candidate has strong... 
    Platform

    Zyphra

    San Francisco, CA
    1 day ago
  •  ...layer 1 blockchain and developer platform that connects any L1 and L2,...  ...you. Experience: 3+ years of cloud infrastructure experience 2+...  ...enjoy building testing and observability capabilities that will accelerate...  ...processes. DevOps Engineer/SRE Transitioning to Blockchain... 
    Platform
    Remote job

    Blockchain Works

    San Francisco, CA
    21 days ago
  • $325k - $405k

     ...Software Engineer, Security Observability to join our Security team. In...  ...data systems to ensure high platform availability. Collaborate closely...  ...Terraform and working with cloud platforms such as Azure....  ...site reliability engineering (SRE), or security. The ability to... 
    Platform
    Remote work
    Relocation package

    Slope

    San Francisco, CA
    1 day ago
  • $177.19k - $364.8k

     ...Software Engineer to join our Observability team at Pinterest. This role...  ...collaboration: Partner with SRE, Infrastructure, Product Engineering...  ...Experience building internal platforms or tools with strong adoption...  ...solutions. Familiarity with cloud‑native architectures and... 
    Platform
    Work at office
    Relocation
    Relocation package

    jobr.pro

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Head of Cloud Platform & SRE — Multi-Cloud & Observability. Be the first to apply!