Site Reliability Engineer [Remote]

$160k - $230k

Together AI

Remote job

As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, and mature automation to our operating environments and codebase.

You specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with varied interests in algorithms and distributed systems.

Requirements

5+ years of professional SRE or related experience
Bachelor's degree in Computer Science or a related field or equivalent work experience
Expert knowledge of Ansible (roles, playbooks), Terraform, and Kubernetes
Proficiency in programming/scripting languages
Direct experience in monitoring and observability practices
Advanced knowledge of cloud services
Ability to thrive in a collaborative environment involving different stakeholders and subject matter experts

Responsibilities

Be on an on-call (PagerDuty) rotation to respond to incidents that impact availability
Build and run our infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users
Build monitoring systems to ensure the highest quality service for our customers
Design and implement operational processes (such as deployments and upgrades)
Debug production issues across all services and levels of the stack
Identify improvements for the product architecture from the reliability, performance and availability perspectives
Plan the growth of Together AI’s infrastructure

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at

Apply

Vacancy posted more than 2 months ago

Similar jobs that could be interesting for youBased on the Site Reliability Engineer [Remote] in San Francisco, CA vacancy

Senior Site Reliability Engineer
...US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...
Suggested
Axiom Pursuits
San Francisco, CA
1 day ago
Sr. Site Reliability Engineer
...Sr. Site Reliability Engineer Job type: Full Time · Department: Platform · Work type: On-Site San Francisco, California, United States (Remote) Optura is healthcare’s AI orchestration platform. We help healthcare organizations transform disconnected AI pilots into a unified...
Suggested
Full time
Remote work
Neara
San Francisco, CA
4 days ago
Site Reliability Engineer
...advanced algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push the... ...and quality. The Role We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area...
Suggested
Dormont Manufacturing Company
San Francisco, CA
2 days ago
Site Reliability Engineer
...the company | Site Reliability Engineer | San Francisco, CA (Hybrid) | Full-time the company is a no-code data workflow automation tool that helps operations teams move, transform, and automate their data without writing code. LLMs are a core part of our product — we use...
Suggested
Full time
United States Digital Space LLC
San Francisco, CA
12 hours ago
Site Reliability Engineer III
$151.5k - $252.5k
...and making a real impact for some of the world’s biggest brands. About The Role We are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering team. You will be working with a global team to build the world’s next modern...
Suggested
Base plus commission
Local area
Worldwide
Veeam
San Francisco, CA
11 hours ago
Site Reliability Engineer - Scale & Observability
A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability...
gamma.app
San Francisco, CA
2 days ago
Site Reliability Engineer
...millions of daily users while enabling our engineering teams to ship fast. You'll own the... ...building automation and tooling that improves reliability and partnering with engineering to... ...services What you'll bring 5+ years in Site Reliability Engineering, DevOps, or systems...
Work at office
Work from home
gamma.app
San Francisco, CA
2 days ago
Site Reliability Engineer
...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow...
Work at office
Remote work
Flexible hours
2 days per week
Plenful
San Francisco, CA
17 hours ago
Senior Site Reliability Engineer
...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like... ...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and...
Unify
San Francisco, CA
4 days ago
Director of Site Reliability Engineering
$210k - $310k
...the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem. SDF is looking for a Director of Site Reliability Engineering to lead a small, high-leverage SRE team and help shape how engineering teams own, operate, and improve production...
Temporary work
Work at office
Local area
Worldwide
Flexible hours
Stellar
San Francisco, CA
1 day ago
Principal Site Reliability Engineer
$300 per month
...About This Role As a Principal Site Reliability Engineer, you will play a critical role in designing and operating a next-generation NeoCloud built for AI, GPU, and high-performance workloads. This role sits at the intersection of infrastructure architecture, reliability...
Temporary work
Dormont Manufacturing Company
San Francisco, CA
2 days ago
Senior/Staff Site Reliability Engineer
$50 per hour
...years of professional SRE experience 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems Bachelor’s Degree in Computer Science or related field, or 8+ years relevant work...
Temporary work
Work experience placement
Dormont Manufacturing Company
San Francisco, CA
4 days ago
Remote Senior Site Reliability Engineer (SRE) - Zetachain
We are seeking a Sr. Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. You’ll learn to deploy and maintain a fleet of RPC and validator nodes for multiple blockchain networks. You’ll also provide guidance...
Remote job
Blockchain Works
San Francisco, CA
a month ago
CloudDevs: Senior Web site Reliability Engineer (SRE)
CloudDevs: Senior Web site Reliability Engineer (SRE) CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engineers for present roles and for upcoming alternatives. You’ll both be positioned...
The10minutecareersolution
San Francisco, CA
1 day ago
Site Reliability Engineer (Senior or Staff), Infrastructure Security
$127k - $249k
We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Staff SRE, you will be very hands‑on technically while also mentoring a small team of SREs. The InfraSec team collaborates...
Local area
Remote work
Flexible hours
Insider, Inc.
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
$166.59k - $199.91k
...About the Role The company is looking for a high-performance engineer to be a part of a team of Site Reliability Engineers. You will be working closely with engineering teams, product managers, as well as support and sales engineers to build the future of the company’s...
Work experience placement
United States Digital Space LLC
Oakland, CA
3 days ago
Staff Site Reliability Engineer, Tech Lead
...customer acquisition, and Connor was a machine learning research engineer at Scale AI . The rest of our team comes from companies like... ...-of-the-art AI. As our Staff SRE Tech Lead, you'll own the reliability and scalability of our platform as we add terabytes of data monthly...
Unify
San Francisco, CA
17 hours ago
Senior Site Reliability Engineer (GPU Clusters) - Hosting
$250k
...Europe, while now significantly expanding its footprint in the United States. The company is looking for a Senior / Staff Site Reliability Engineer to support and scale large-scale HPC and cloud environments powering GPU-intensive workloads. The role involves working...
Permanent employment
Remote work
San Francisco, CA
a month ago
Senior Site Reliability Engineer- San Francisco, CA, the US
...Job Description Job Description Senior Site Reliability Engineer (Payments Infrastructure) Kody is seeking a Senior Site Reliability Engineer to ensure the reliability, availability, scalability, and operational excellence of our global payment platform. You will...
Kody
San Francisco, CA
21 days ago
Site Reliability Engineer
$130k - $175k
...scale energy storage and producing battery materials in the U.S. for the first time, all from batteries we already have. Site Reliability Engineer Essential Duties: We are seeking a highly skilled and motivated Site Reliability Engineer to collect requirements,...
Full time
Casual work
Work at office
Local area
Night shift
Redwood Materials
San Francisco, CA
more than 2 months ago
Senior Site Reliability Engineer
$181.69k - $213.75k
...more people in more places. We believe that the problems we solve today unlock the opportunities of tomorrow. As a Senior Site Reliability Engineer, you’ll work to: Build and scale our internal platform offerings (compute, storage and networking services) to ensure the...
Full time
Work at office
Carta
San Francisco, CA
1 day ago
Site Reliability Engineer
$100k - $200k
...Instacart, NFI, Ramp, and Zscaler. We’re building the most reliable and secure identity platform in the world. To do that, we... ..., automates, and recovers without skipping a beat. As a Site Reliability Engineer, you’ll help us design, run, and improve the systems that...
Remote work
Flexible hours
ConductorOne
San Francisco, CA
more than 2 months ago
Site Reliability Engineer
$150k
...Job Description Job Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing engineering team. In this role, you will oversee and maintain the reliability, security posture, and...
VantageScore
San Francisco, CA
27 days ago
Site Reliability Engineer - Supercomputing
$180k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...their teammates. About the Role We are seeking a talented Site Reliability Engineer (SRE) to join our SuperComputing team. In this role...
Temporary work
Relocation
xAI
San Francisco, CA
more than 2 months ago
Site Reliability Engineer - Hosting
...design of information and operational support systems. Required Skills/Qualifications: BS/MS degree in Computer Science, Engineering, or a related subject. Equivalent experience accepted. Proven working experience in installing, configuring, and troubleshooting...
Permanent employment
Work experience placement
Start working today
Remote work
Flexible hours
San Francisco, CA
more than 2 months ago
Site Reliability Engineer II, tvScientific
$114.3k - $235.32k
...verification who have now purpose-built a CTV performance platform advertisers can trust to grow their business. We are seeking a Site Reliability Engineer to help operate, scale, and continuously improve a cloud-native platform built on AWS, Kubernetes/EKS, and ArgoCD-driven...
Work at office
Relocation
Relocation package
Pinterest
San Francisco, CA
1 day ago
Senior SRE Engineer: Scale & Reliability (Kubernetes/GCP)
A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...
Speak
San Francisco, CA
2 days ago
Senior SRE & InfraSec Engineer — Remote
The Consulting Solutions is seeking an experienced Senior / Staff Engineer for our SRE, InfraSec team in Seattle. The role involves leading the security of cloud-based infrastructure, mentoring a team of SREs, and collaborating with other engineering teams to ensure high...
Remote job
The Consulting Solutions
San Francisco, CA
1 day ago
Senior Site Reliability Engineer (SRE) - AI Inftastructure
$300k
...thousands of H100s, H200s, and B200s, ready for experimentation, full-scale model training, or inference. As a Platform Engineer/Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation of this GPU-powered infrastructure, ensuring...
Permanent employment
San Francisco, CA
more than 2 months ago
Site Reliability Engineer - Storage
$180k
...knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who... ...share knowledge with their teammates. About the role As a Site Reliability Storage Engineer, you will play a pivotal role in designing,...
Remote job
Temporary work
xAI
San Francisco, CA
more than 2 months ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer [Remote]. Be the first to apply!