Principal SRE: AI Cloud Reliability Architect
Dormont Manufacturing Co
Dormont Manufacturing Co in San Francisco is searching for a Principal Site Reliability Engineer to lead the design and reliability of a next-gen NeoCloud platform. You will define reliability architecture and oversee incident responses, ensuring high performance and efficiency in distributed systems. The ideal candidate brings over 10 years of experience, especially in SRE principles and cloud environments. The role offers competitive pay and industry-standard benefits. #J-18808-Ljbffr Dormont Manufacturing Co
$261k - $326k
A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions...Principal- Epoch Biodesign is looking for a Principal Site Reliability Engineer in San Francisco, CA. The role involves... ...a next-generation NeoCloud for AI and GPU workloads. Candidates should have... ...distributed systems and possess strong skills in SRE principles, Kubernetes, and programming...Principal
$183k - $250k
Crusoe Energy in San Francisco, California, is seeking a Site Reliability Engineer (SRE) with extensive experience in infrastructure design and high-quality coding. The role offers a hybrid work schedule, competitive compensation between $183,000 and $250,000, and a comprehensive...Suggested$180k - $250k
A prominent tech company based in San Francisco is seeking a seasoned Site Reliability Engineer (SRE) to oversee the reliability and availability of customer-facing systems. You will manage Kubernetes clusters, build CI/CD pipelines, and leverage automation to enhance production...SuggestedVisa sponsorship- Qcells North America is seeking a Senior DevOps & SRE Manager to oversee the reliability and operational excellence of complex platforms. Candidates should... ...-making environments and are comfortable leveraging AI tools in their workflows. #J-18808-Ljbffr Qcells North AmericaSuggested
- Prosper in San Francisco is looking for a Senior Technical Contributor in the SRE Team. This role focuses on maintaining the reliability, scalability, and security of Prosper’s Cloud Platform portfolio, blending platform engineering with site reliability engineering. The...
$200k - $250k
...visual creation platform that combines modern web tooling with AI-powered workflows. Our stack includes React/TypeScript frontend,... ...owner of stability and infrastructure to ensure the platform is reliable, fast, and resilient as we scale. Role Mission Own service reliability...Permanent employment$300k
...stealth-mode startup building out their AI and cloud platform, powered by thousands of H100s,... ...inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability,... .../ Must Have: 7+ years of experience in SRE, DevOps, or Infrastructure Engineering...$300 per month
About This Role As a Principal Site Reliability Engineer, you will play... ...generation NeoCloud built for AI, GPU, and high-... ..., and ensure the cloud scales safely, efficiently... ...long-term remediation Architect and improve... ...engineers across the SRE and infrastructure org...PrincipalTemporary work- ...San Francisco is seeking an experienced DevOps Engineer/Site Reliability Engineer (SRE) to join its team. The ideal candidate will have over 8... ...for production services. Candidates should be familiar with cloud platforms such as AWS, Azure, or GCP, and have expertise in...
- ...DevOps Engineer to enhance the reliability of their production systems.... ...over 5 years of experience in SRE or DevOps with strong knowledge in observability stacks and cloud platforms. Join us in our mission... ...design through innovative AI solutions. #J-18808-Ljbffr Flux...
- deCircle is seeking a Site Reliability Engineer based in San Francisco to ensure operational excellence for our GPU marketplace and AI infrastructure. The role involves defining service level objectives, managing capacity for a distributed system, and ensuring security...
- THE ROLE As Senior Manager of Cloud Platform and Site Reliability, you will lead and grow the org responsible... ...health of our cloud infrastructure and SRE practice — from coaching your leads... ...Familiarity with running high‑performance AI models and workloads, including...Temporary workFlexible hours
$159.8k - $235k
...looking for a Software Engineer for its Reliability Platform team in San Francisco. This role... ...particularly in Go, and familiarity with cloud infrastructure like AWS and tools like Terraform... ...through automation and the use of AI tools, ensuring the success of over 4,000...- Slope in San Francisco is looking for a reliability engineer focused on managing call completion for its Voice AI platform. You will be key in establishing incident management processes and improving system stability through effective monitoring and capacity planning. Candidates...
$162.6k - $302k
...ecosystems. As a Site Reliability Engineer in the Solutions... ..., and supportable cloud‑based platform solutions... ...Design and Implementation Architect and implement IaC solutions... ...Engineering (SRE). Proven expertise in supporting... ...AWS SageMaker, Google AI Platform, or Azure ML....PrincipalLocal areaRelocation package3 days per week$144k - $240k
Lila Sciences is seeking a Sr Principal / Principal Software Engineer to join their innovative team in San Francisco, CA. You will design and build AI-driven applications, focusing on performance, reliability, and cross-functional collaboration with scientists. Ideal candidates...PrincipalFlexible hours- Principal Engineer, AI Platform & Infrastructure About the Role SPREEAI is building the future of AI... ...to move from research prototypes to reliable, production‑grade deployments powering... ...Python, PyTorch, Kubernetes, Docker, cloud infrastructure, and GPU-based workloads...Principal
- The Principal AI Platform Engineer at Nextdata designs and builds interfaces, systems, and agents that make governed enterprise data usable... ...its meaning, request access, execute safe actions, and return reliable answers with context, lineage, and policy enforcement....Principal
$159.8k - $235k
About the Team The Reliability Platform role is a key pillar of DoorDash... ...the pragmatic perspective of an SRE, and deliver solutions with... ...frameworks, analytics tools, and AI Agent enablement to extract... ...resilient, performant, and efficient. Cloud/Infra Expertise: You're...Hourly payWork at officeLocal areaFlexible hours- ...identity security, delivering an AI‑powered platform that governs... ...customers. As we scale globally, reliability, availability, and performance are... ...using Go (Golang) or Python. Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.),...Principal
- Speakeasy in San Francisco is looking for a candidate to enhance product reliability and performance by collaborating with a dynamic team. The role involves identifying architectural changes, fostering a reliable culture, and participating in on-call rotations. The ideal...
- A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...
- LiveRamp is looking for a Senior Staff Site Reliability Engineer based in San Francisco, California, who will set the technical direction... ...global infrastructure. This role includes defining and owning the SRE strategy while mentoring staff engineers and overseeing...
$256k - $320k
...the only vertically integrated AI infrastructure company built... ...center construction, and cloud services. If you want to do... ...This Role We are looking for a Principal Software engineer to help design... ...team, considering reliability, scalability, operational costs...PrincipalTemporary work- Algora Public Benefit Corporation is looking for an AI Cloud Infra Engineer to join their team in San Francisco. You will ensure the reliability of backend systems and work closely with engineers to plan for future growth. The ideal candidate has strong cloud infrastructure...
- Megaport is hiring a Senior Platform Engineer to enhance production reliability and ensure robust systems in a supportive environment. This... ...teams globally, championing DevOps practices, and working on cloud infrastructure technologies like AWS and Kubernetes. Ideal candidates...
$193.8k - $285k
About the Team The Reliability Platform role is a key pillar of DoorDash... ...the pragmatic perspective of an SRE, and deliver solutions with... ...frameworks, analytics tools, and AI Agent enablement to extract... ...Manage the team's budget for Cloud Provider Infra and 3rd party vendor...Hourly payWork at officeLocal areaRemote workFlexible hours- Senior Principal Front-End Network Engineer, AI Infrastructure Operations Houston; New York... ...Nscale Nscale is the GPU cloud engineered for AI. We provide... ...is focused on owning the reliability, scalability, and long-... ...Partnering with SRE, Compute Platform, Storage...PrincipalFlexible hours
- ...Biodesign in San Francisco is seeking a Senior Staff Cloud Support Engineer to lead technical escalations and... ...decisions while ensuring high availability for AI workloads. The ideal candidate has over 8 years of SRE experience, deep knowledge of Kubernetes, and strong...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal SRE: AI Cloud Reliability Architect. Be the first to apply!
- cloud engineering manager San Francisco, CA
- informatica cloud developer San Francisco, CA
- senior cloud data engineer San Francisco, CA
- cloud engineer San Francisco, CA
- senior devops cloud engineer San Francisco, CA
- graduate cloud engineer San Francisco, CA
- cloud operations engineer San Francisco, CA
- cloud developer San Francisco, CA
- devops cloud engineer San Francisco, CA
- principal cloud computing engineer San Francisco, CA
