Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal SRE: AI Cloud Reliability Architect

Dormont Manufacturing Co

Dormont Manufacturing Co in San Francisco is searching for a Principal Site Reliability Engineer to lead the design and reliability of a next-gen NeoCloud platform. You will define reliability architecture and oversee incident responses, ensuring high performance and efficiency in distributed systems. The ideal candidate brings over 10 years of experience, especially in SRE principles and cloud environments. The role offers competitive pay and industry-standard benefits. #J-18808-Ljbffr Dormont Manufacturing Co

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Principal SRE: AI Cloud Reliability Architect in San Francisco, CA vacancy
  • $261k - $326k

    A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions... 
    Principal

    Crusoe

    San Francisco, CA
    4 days ago
  • Epoch Biodesign is looking for a Principal Site Reliability Engineer in San Francisco, CA. The role involves...  ...a next-generation NeoCloud for AI and GPU workloads. Candidates should have...  ...distributed systems and possess strong skills in SRE principles, Kubernetes, and programming... 
    Principal

    Epoch Biodesign

    San Francisco, CA
    4 days ago
  • $183k - $250k

    Crusoe Energy in San Francisco, California, is seeking a Site Reliability Engineer (SRE) with extensive experience in infrastructure design and high-quality coding. The role offers a hybrid work schedule, competitive compensation between $183,000 and $250,000, and a comprehensive... 
    Suggested

    Dormont Manufacturing Co

    San Francisco, CA
    1 day ago
  • $180k - $250k

    A prominent tech company based in San Francisco is seeking a seasoned Site Reliability Engineer (SRE) to oversee the reliability and availability of customer-facing systems. You will manage Kubernetes clusters, build CI/CD pipelines, and leverage automation to enhance production... 
    Suggested
    Visa sponsorship

    Fal

    San Francisco, CA
    2 days ago
  • Qcells North America is seeking a Senior DevOps & SRE Manager to oversee the reliability and operational excellence of complex platforms. Candidates should...  ...-making environments and are comfortable leveraging AI tools in their workflows. #J-18808-Ljbffr Qcells North America
    Suggested

    Qcells North America

    San Francisco, CA
    2 days ago
  • Prosper in San Francisco is looking for a Senior Technical Contributor in the SRE Team. This role focuses on maintaining the reliability, scalability, and security of Prosper’s Cloud Platform portfolio, blending platform engineering with site reliability engineering. The... 

    Prosper

    San Francisco, CA
    5 days ago
  • $200k - $250k

     ...visual creation platform that combines modern web tooling with AI-powered workflows. Our stack includes React/TypeScript frontend,...  ...owner of stability and infrastructure to ensure the platform is reliable, fast, and resilient as we scale. Role Mission Own service reliability... 
    Permanent employment

    Vizcom

    San Francisco, CA
    4 days ago
  • $300k

     ...stealth-mode startup building out their AI and cloud platform, powered by thousands of H100s,...  ...inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability,...  .../ Must Have: 7+ years of experience in SRE, DevOps, or Infrastructure Engineering... 

    Hamilton Barnes Associates Limited

    San Francisco, CA
    1 day ago
  • $300 per month

    About This Role As a Principal Site Reliability Engineer, you will play...  ...generation NeoCloud built for AI, GPU, and high-...  ..., and ensure the cloud scales safely, efficiently...  ...long-term remediation Architect and improve...  ...engineers across the SRE and infrastructure org... 
    Principal
    Temporary work

    Dormont Manufacturing Co

    San Francisco, CA
    2 days ago
  •  ...San Francisco is seeking an experienced DevOps Engineer/Site Reliability Engineer (SRE) to join its team. The ideal candidate will have over 8...  ...for production services. Candidates should be familiar with cloud platforms such as AWS, Azure, or GCP, and have expertise in... 

    Sulekha.com New Media Pvt Ltd

    San Francisco, CA
    5 days ago
  •  ...DevOps Engineer to enhance the reliability of their production systems....  ...over 5 years of experience in SRE or DevOps with strong knowledge in observability stacks and cloud platforms. Join us in our mission...  ...design through innovative AI solutions. #J-18808-Ljbffr Flux... 

    Flux Enterprise

    San Francisco, CA
    4 days ago
  • deCircle is seeking a Site Reliability Engineer based in San Francisco to ensure operational excellence for our GPU marketplace and AI infrastructure. The role involves defining service level objectives, managing capacity for a distributed system, and ensuring security... 

    deCircle

    San Francisco, CA
    3 days ago
  • THE ROLE As Senior Manager of Cloud Platform and Site Reliability, you will lead and grow the org responsible...  ...health of our cloud infrastructure and SRE practice — from coaching your leads...  ...Familiarity with running high‑performance AI models and workloads, including... 
    Temporary work
    Flexible hours

    Baseten

    San Francisco, CA
    9 days ago
  • $159.8k - $235k

     ...looking for a Software Engineer for its Reliability Platform team in San Francisco. This role...  ...particularly in Go, and familiarity with cloud infrastructure like AWS and tools like Terraform...  ...through automation and the use of AI tools, ensuring the success of over 4,000... 

    Fairygodboss

    San Francisco, CA
    5 days ago
  • Slope in San Francisco is looking for a reliability engineer focused on managing call completion for its Voice AI platform. You will be key in establishing incident management processes and improving system stability through effective monitoring and capacity planning. Candidates... 

    Slope

    San Francisco, CA
    1 day ago
  • $162.6k - $302k

     ...ecosystems. As a Site Reliability Engineer in the Solutions...  ..., and supportable cloud‑based platform solutions...  ...Design and Implementation Architect and implement IaC solutions...  ...Engineering (SRE). Proven expertise in supporting...  ...AWS SageMaker, Google AI Platform, or Azure ML.... 
    Principal
    Local area
    Relocation package
    3 days per week

    F. Hoffmann-La Roche AG

    South San Francisco, CA
    4 days ago
  • $144k - $240k

    Lila Sciences is seeking a Sr Principal / Principal Software Engineer to join their innovative team in San Francisco, CA. You will design and build AI-driven applications, focusing on performance, reliability, and cross-functional collaboration with scientists. Ideal candidates... 
    Principal
    Flexible hours

    Jobr

    San Francisco, CA
    4 days ago
  • Principal Engineer, AI Platform & Infrastructure About the Role SPREEAI is building the future of AI...  ...to move from research prototypes to reliable, production‑grade deployments powering...  ...Python, PyTorch, Kubernetes, Docker, cloud infrastructure, and GPU-based workloads... 
    Principal

    SpreeAI

    San Francisco, CA
    1 day ago
  • The Principal AI Platform Engineer at Nextdata designs and builds interfaces, systems, and agents that make governed enterprise data usable...  ...its meaning, request access, execute safe actions, and return reliable answers with context, lineage, and policy enforcement.... 
    Principal

    Nextdata

    San Francisco, CA
    5 days ago
  • $159.8k - $235k

    About the Team The Reliability Platform role is a key pillar of DoorDash...  ...the pragmatic perspective of an SRE, and deliver solutions with...  ...frameworks, analytics tools, and AI Agent enablement to extract...  ...resilient, performant, and efficient. Cloud/Infra Expertise: You're... 
    Hourly pay
    Work at office
    Local area
    Flexible hours

    Fairygodboss

    San Francisco, CA
    5 days ago
  •  ...identity security, delivering an AI‑powered platform that governs...  ...customers. As we scale globally, reliability, availability, and performance are...  ...using Go (Golang) or Python. Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.),... 
    Principal

    Saviynt

    San Francisco, CA
    4 days ago
  • Speakeasy in San Francisco is looking for a candidate to enhance product reliability and performance by collaborating with a dynamic team. The role involves identifying architectural changes, fostering a reliable culture, and participating in on-call rotations. The ideal... 

    Speakeasy

    San Francisco, CA
    4 days ago
  • A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform... 

    Speak

    San Francisco, CA
    2 days ago
  • LiveRamp is looking for a Senior Staff Site Reliability Engineer based in San Francisco, California, who will set the technical direction...  ...global infrastructure. This role includes defining and owning the SRE strategy while mentoring staff engineers and overseeing... 

    LiveRamp

    San Francisco, CA
    4 days ago
  • $256k - $320k

     ...the only vertically integrated AI infrastructure company built...  ...center construction, and cloud services. If you want to do...  ...This Role We are looking for a Principal Software engineer to help design...  ...team, considering reliability, scalability, operational costs... 
    Principal
    Temporary work

    Epoch Biodesign

    San Francisco, CA
    5 days ago
  • Algora Public Benefit Corporation is looking for an AI Cloud Infra Engineer to join their team in San Francisco. You will ensure the reliability of backend systems and work closely with engineers to plan for future growth. The ideal candidate has strong cloud infrastructure... 

    Algora Public Benefit Corporation

    San Francisco, CA
    5 days ago
  • Megaport is hiring a Senior Platform Engineer to enhance production reliability and ensure robust systems in a supportive environment. This...  ...teams globally, championing DevOps practices, and working on cloud infrastructure technologies like AWS and Kubernetes. Ideal candidates... 

    Megaport

    Brisbane, CA
    4 days ago
  • $193.8k - $285k

    About the Team The Reliability Platform role is a key pillar of DoorDash...  ...the pragmatic perspective of an SRE, and deliver solutions with...  ...frameworks, analytics tools, and AI Agent enablement to extract...  ...Manage the team's budget for Cloud Provider Infra and 3rd party vendor... 
    Hourly pay
    Work at office
    Local area
    Remote work
    Flexible hours

    Fairygodboss

    San Francisco, CA
    1 day ago
  • Senior Principal Front-End Network Engineer, AI Infrastructure Operations Houston; New York...  ...Nscale Nscale is the GPU cloud engineered for AI. We provide...  ...is focused on owning the reliability, scalability, and long-...  ...Partnering with SRE, Compute Platform, Storage... 
    Principal
    Flexible hours

    Nscale

    San Francisco, CA
    2 days ago
  •  ...Biodesign in San Francisco is seeking a Senior Staff Cloud Support Engineer to lead technical escalations and...  ...decisions while ensuring high availability for AI workloads. The ideal candidate has over 8 years of SRE experience, deep knowledge of Kubernetes, and strong... 

    Epoch Biodesign

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal SRE: AI Cloud Reliability Architect. Be the first to apply!