Site Reliability Engineer [Remote]
$160k - $230kTogether AI
- Remote job
As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, and mature automation to our operating environments and codebase.
You specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with varied interests in algorithms and distributed systems.
Requirements
- 5+ years of professional SRE or related experience
- Bachelor's degree in Computer Science or a related field or equivalent work experience
- Expert knowledge of Ansible (roles, playbooks), Terraform, and Kubernetes
- Proficiency in programming/scripting languages
- Direct experience in monitoring and observability practices
- Advanced knowledge of cloud services
- Ability to thrive in a collaborative environment involving different stakeholders and subject matter experts
Responsibilities
- Be on an on-call (PagerDuty) rotation to respond to incidents that impact availability
- Build and run our infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users
- Build monitoring systems to ensure the highest quality service for our customers
- Design and implement operational processes (such as deployments and upgrades)
- Debug production issues across all services and levels of the stack
- Identify improvements for the product architecture from the reliability, performance and availability perspectives
- Plan the growth of Together AI’s infrastructure
About Together AI
Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.
Compensation
We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.
Equal Opportunity
Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
Please see our privacy policy at
$104k - $130k
...infrastructure as well as help improve the reliability, quality of services and overall... ...recovery. You’ll collaborate or embed with engineering teams, helping them to improve the... ...more about our locations by visiting our site. Compensation & Benefits The base...SuggestedFull timeWork experience placement$163k - $203k
...will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our...SuggestedWork experience placementWork at officeLocal areaRemote workFlexible hours2 days per week$150k
...Job Description Job Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing engineering team. In this role, you will oversee and maintain the reliability, security posture, and...Suggested$125k - $165k
Position: Site Reliability Engineer Location: San Francisco, CA Job Id: 434 # of Openings: 1 TELCOR Inc, a leading innovator in laboratory software, is looking for a Site Reliability Engineer to join our TELCOR AI Systems team! Do you have strong experience in cloud...SuggestedTemporary workWork at officeVisa sponsorshipWork visaRelocation packageFlexible hours- ...and enthusiasm for building a great culture and product, you will find a home at Fieldguide. About the Role As a Senior Site Reliability Engineer (SRE) at Fieldguide, you will be responsible for ensuring the reliability, scalability, and observability of our production...SuggestedRemote workWork from homeFlexible hours
- ...millions of daily users while enabling our engineering teams to ship fast. You'll own the... ...building automation and tooling that improves reliability and partnering with engineering to... ...services What you'll bring 5+ years in Site Reliability Engineering, DevOps, or systems...Work at officeWork from home
- ...alongside clinicians to make that possible. We’re a team of doctors, engineers, designers, researchers, and creatives building tools that... ...for leading incidents end-to-end. Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes...Work at officeWorldwide
$163k - $203k
...will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our...Work experience placementWork at officeLocal areaRemote workFlexible hours2 days per week- The role We're looking for a world-class Site Reliability Engineer to ensure the reliability, performance, and scalability of our AI infrastructure platform. You’ll be building and operating the core systems that power agentic AI at scale. Your mission: keep our ultra-...
- # Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values:** At Drata, we help companies earn and keep the trust of... ...**Job Summary:**Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part...Work at officeImmediate startWorldwideMonday to FridayFlexible hours
$175k - $250k
...fully distributed across North American time zones and supports a fast‑growing customer base of SaaS companies. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team ensures the WorkOS platform remains fast, reliable, and resilient at...Remote work$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,...Full timeWork at officeFlexible hours- ...work from home day is currently Tuesday. Engineering at Lambda is responsible for building... ...observability adoptable and improve product reliability. Lead members of other engineering teams... ...in Go Have 5+ years of experience in Site Reliability Engineering practices Possess...Work at officeLocal areaWork from home
$166.9k - $225.9k
Job Summary Drata's SRE team operates as both a central engineering function and an embedded reliability practice. You'll be part of a close-knit SRE team... ...organization. What you’ll bring 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or...Flexible hours- ...manifesto. About the Role We're looking for an Infrastructure Engineer to take the lead on scaling our operational resilience as we... ...This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and...WorldwideShift work
$140.3k - $191.55k
...WriteMed.AI helps Biopharma and Life Sciences companies reduce time to write medical publications and regulatory paperwork. Site Reliability Engineer Location: Atlanta, GA; Miami, FL; Cambridge, MA; San Francisco, CA; Towson, MD Role Overview Our technical team supports...Temporary workWork experience placement- ...The TeamPlatform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...fleet, alongside the critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-manager, and Gatekeeper). As...Work at officeLocal areaRemote workWorldwideFlexible hours
$140k - $205k
...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam. Position summary: The Senior Technology Site Reliability Engineer("SRE") is responsible for ensuring the reliability...Full timeTemporary workWork at officeFlexible hoursWeekend work- US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...
- For more information, please read ourSenior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations: US - San Francisco Bay Areatime type: Full timeposted on: Posted Yesterdayjob requisition id: R1478**There are NO limits to your career: come...Immediate startRemote workWorldwide
- What you’ll do As a Senior Site Reliability Engineer, you’ll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you’ll lead on infrastructure design and...
- ...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like... ...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and...
$125k - $165k
Position Site Reliability Engineer Location Lincoln, NE, San Francisco, CA, or Remote Job ID 434 Openings 1 Job Summary The Site Reliability Engineer will help ensure the reliability, scalability, and performance of the systems that power our AI products. This role...Temporary workRemote workVisa sponsorshipWork visaFlexible hours$151.5k - $252.5k
...and making a real impact for some of the world’s biggest brands. About The Role We are looking for an experienced Senior Site Reliability Engineer to join the Veeam Data Cloud (VDC) engineering team. You will be working with a global team to build the world’s next modern...Base plus commissionLocal areaWorldwide$165k - $225k
...and the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem. SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our engineering teams. You’ll ensure the reliability and scalability...Temporary workWork at officeLocal areaWorldwideFlexible hours$160k - $250k
Responsibilities Automate manual operational processes Improve workflows of developer, data, and machine learning teams Manage secure integration and deployment tooling Create, maintain, monitor, and audit secure infrastructure Manage a diverse array of technology platforms...- ...shape the future of healthcare, we’d love to meet you. About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You ’ll work across our distributed workflow...Work at officeRemote workFlexible hours2 days per week
- ...co-founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and...
- A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and strong programming skills. You will manage production systems' reliability...
- ...advanced algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push the... ...and quality. The Role We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer [Remote]. Be the first to apply!
- site reliability engineer San Francisco, CA
- site reliability engineer remote San Francisco, CA
- site reliability engineer sre San Francisco, CA
- website content developer San Francisco, CA
- site services specialist San Francisco, CA
- site recruiter San Francisco, CA
- IT site lead San Francisco, CA
- on-site clinical research associate (traveling/remote) San Francisco, CA
- on site coordinator San Francisco, CA
- website coordinator San Francisco, CA


