Senior Platform Reliability Engineer
$182k - $250kTransformcap
Grow Therapy is on a mission to serve as the trusted partner for therapists growing their practice, and patients accessing high-quality care. Powered by technology, we are a three-sided marketplace that empowers providers, augments insurance payors, and serves patients. Following the mass increase in depression and anxiety, the need for accessibility is more important than ever. To make our vision for mental healthcare a reality, we’re building a team of entrepreneurs and mission-driven go-getters. Since launching in February 2021, we’ve empowered more than ten thousand therapists and hundreds of thousands of clients across the country and insurance landscape. We’ve raised more than $328Mm in funding, including our Series D, at a $3B valuation from Sequoia Capital, Transformation Capital, TCV, SignalFire, Menlo Ventures, Goldman Sachs Alternatives, and others. About the Role We’re hiring a Senior Platform Reliability Engineer to help define and scale reliability as a first-class capability at Grow. In this role you’ll operate horizontally across the organization, shaping how reliability is understood, measured, and built into the developer experience. You’ll work closely with other members of the platform team as well as our product engineering teams to establish standards around observability, SLOs/SLAs, and incident response—while also helping translate those standards into self-service tooling and “golden paths” that make it easy for teams to adopt them. This is a high-impact, highly autonomous role where you’ll drive both cultural and technical change, ultimately enabling teams to independently build and operate reliable systems at scale. What You'll Work On You’ll help us establish and scale reliability as a discipline at Grow by: Defining Reliability Standards Establishing frameworks for SLOs/SLAs, error budgets, and operational readiness; helping teams understand what to measure and why it matters. Improving Observability & Measurement Identifying gaps in metrics, logging, and tracing; ensuring services are measurable, debuggable, and aligned with reliability goals. Evolving Incident Response Developing and improving incident response practices, from detection to post-incident learning, and helping teams build sustainable on-call and escalation patterns. Enabling Self-Service Reliability Partnering with the platform team to build tooling and abstractions (e.g., service scorecards, dashboards, templates, golden paths) that make it easy for teams to adopt and stay compliant with reliability standards. Driving Adoption Across Teams Working cross-functionally to educate, influence, and guide engineering teams—scaling reliability practices through a combination of clear standards, strong communication, and developer-friendly systems Who You Are Experienced in production systems: You have 6+ years of experience operating and improving reliability of production systems at scale. Strong foundation in cloud and infrastructure: You have hands‑on experience with AWS, Kubernetes (e.g., EKS), and infrastructure as code tools like Terraform. Deep understanding of reliability principles: You’ve defined or worked with SLOs/SLAs, understand error budgets, and have experience improving reliability through measurement and iteration. Observability expertise: You’ve worked with modern observability tooling (we use DataDog) and understand how to build actionable monitoring systems across metrics, logs, and traces. Systems thinker: You’re able to zoom out, identify patterns across teams and services, and design solutions that scale beyond a single system. Impact-oriented: You focus on outcomes over output and care deeply about improving real reliability outcomes—not just adding processes. Strong communicator and influencer: You can drive change across teams without direct authority, balancing pragmatism with long-term vision. Self-directed: You thrive in ambiguous environments and are comfortable defining problems, proposing solutions, and executing independently. Team player : You collaborate well, communicate with empathy, and enjoy mentoring and learning from others. Bonus Points You’ve helped introduce or scale reliability practices in a growing organization. You’ve built internal tooling or platforms used by multiple teams. You have experience designing service-level scorecards or compliance/reporting systems. You’ve worked with both SaaS (e.g., DataDog) and self‑managed observability stacks. You were previously a product engineer and bring empathy for developer experience. You have experience with database reliability and performance (we use PostgreSQL) Why This Role Is Exciting This is a rare opportunity to define what reliability looks like at a growing, scaling engineering organization—and to do it in a way that actually sticks. You won’t just be responding to incidents or working within a single team. You’ll be shaping how reliability is measured, enforced, and experienced across the entire company. You’ll work alongside your team mates to turn best practices into intuitive, self-service systems that engineers rely on every day. Your work will directly improve system reliability, reduce incidents, and enable teams to move faster with confidence, ultimately making reliability a built-in property of how we build software at Grow. Role Details Employment Type: Full Time, Exempt Base Compensation: The base compensation range for this position is $182,000–$250,000 USD Annually. This is a hybrid role with the expectation to work onsite from our San Francisco, NYC, or Seattle hub location three days per week (Tuesday, Wednesday, and Thursday) and travel 2–3 times per year (e.g., company and department offsites). The base compensation for this role will vary depending on several factors, including relevant experience, qualifications, and the candidate’s working location. Full Time Employee Benefits: Comprehensive Health Coverage: Medical, dental, and vision insurance, plus life and disability coverage. Parental Leave & Family Support: Up to 18 weeks paid leave and a new child stipend. Financial Wellness: 401(k) program and equity opportunities. Meals & Home Office Support: Stipends for home office setup and ongoing funds for meals, with tailored perks for both remote and in-office employees. Time Off to Recharge: Flexible PTO, 12 paid holidays, and a full winter break week. Wellness & Development: Annual stipends to put towards personal & professional growth. Mental & Physical Health Support: No-cost access to therapy through the Grow platform, weekly flexible hours for self-care (“Mental Health Mornings/Afternoons”) and memberships to leading wellness apps (such as One Medical, Headspace, and Talkspace). Extra Perks: Pet insurance discounts, commuter benefits, and global travel assistance. Research shows that some groups hesitate to apply unless they meet every qualification. If you’re excited about this role but don’t check every box, we encourage you to apply. At Grow, we value diverse experiences, transferable skills, and the unique strengths each person brings. Grow Therapy is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. By submitting your application, you acknowledge and consent to the use of automated tools as part of our recruitment process. Specifically, we use a third‑party AI tool, Gem, to assist in the initial screening of resumes. Importantly, no hiring decisions are made by the AI tool. All decisions about which candidates move forward are made by our human recruiting team after independent review. We are committed to transparency and fairness in our hiring practices. If you have questions about how our AI tools work, or would like more information about how your application will be processed, please contact us at View email address on click.appcast.io. If you require an accommodation due to a disability, or have concerns about the use of AI in the hiring process, please also contact us. We are happy to provide assistance or offer an alternative method of participating in the recruitment process. #J-18808-Ljbffr Transformcap
$163k - $203k
GoTo Meeting is looking for a Senior Site Reliability Engineer in San Francisco. You will be responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This role requires expertise in Kubernetes, cloud platforms (preferably GCP), and...Senior- An innovative R&D company in San Francisco is seeking a Site Reliability Engineer to join its Platform Engineering team. This position focuses on ensuring the reliability and performance of an AI-powered code review platform. The ideal candidate will have 6-8 years of experience...Senior
- A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates...Senior
$200k - $250k
A leading visual creation platform in San Francisco is seeking a Senior Owner of Stability and Infrastructure. This hands-on technical leadership role demands expertise in service reliability to ensure the platform's performance as it scales. Responsibilities include setting...Senior- OpenArt AI in San Francisco is seeking a Senior Platform & Reliability Engineer to design and improve the reliability of its infrastructure. The role emphasizes building and operating production systems while collaborating with product engineers to ensure platform scalability...Senior
$200k - $250k
...unsolicited. About Vizcom Vizcom is a visual creation platform that combines modern web tooling with AI-... ...production infrastructure. We’re hiring a senior owner of stability and infrastructure to ensure the platform is reliable, fast, and resilient as we scale. Role...SeniorPermanent employment- Overview Senior Platform & Reliability Engineer OpenArt is an AI Storytelling and Visual Creation Platform used by millions worldwide. We’re building the next generation of creative tools powered by cutting-edge AI, enabling anyone to create videos, visuals, characters...SeniorRemote workWorldwideVisa sponsorship
- ...identity security, delivering an AI-powered platform that governs and secures access to... ...cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in ensuring... ...technical leadership role. You will own reliability for major platform domains, design...Senior
- ...raised to date. About the role Anyscale is looking for a Senior Site Reliability Engineer to join the Infrastructure team. Anyscale aims to provide... ...the critical infrastructure that powers Anyscale’s cloud platform. You will have the opportunity to work on open-source...Senior
$232k - $319k
...too, let's talk. The Infrastructure Platform and Shared Services Team Okta authenticates... ...scale the service with great people and reliable, cost-effective, and efficient... ...Accelerate the velocity of SRE and product engineering by developing robust platforms, powerful...SeniorPermanent employmentLocal areaWorldwideFlexible hours$202.8k - $327.63k
...Intelligent Agreement Management platform, companies can create, commit, and... ...management (CLM). What you’ll do The Senior Director, SRE Platform Engineering is a senior engineering leader... ...Service Management (ITSM) and Site Reliability Engineering (SRE) capabilities, applying...SeniorPermanent employmentContract workWork at officeLocal areaRemote work2 days per week- Hudson Manpower is seeking a Mechanical Engineer - Offshore Reliability for a role involving the improvement of offshore mechanical equipment reliability and performance. This position requires a Bachelor's Degree in Mechanical Engineering and a minimum of 12 years of experience...Senior
- A leading AI research organization in San Francisco is seeking a cross-stack engineer to ensure reliability in next-generation AI systems. This hands-on position requires extensive experience in reliability modeling and DFX architecture to enhance the durability and performance...Senior
- Revel is seeking a Senior Software Reliability Engineer in San Francisco to enhance their deployment tooling for zero-downtime releases. You will design and maintain CI pipelines, focus on Nix-based systems, and support high-consequence software delivery. Applicants should...Senior
- scribehow.com is seeking a Senior Database Reliability Engineer based in San Francisco (hybrid model). You will own the reliability, performance, and scalability of our data tier and work with a growing engineering team. Your expertise will ensure smooth operations across...SeniorRemote job
- ...universally accessible, secure, and affordable. Join us in building a platform that empowers innovators everywhere to turn their visionary... ...computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure...Senior
$160k - $190k
Southern Recruiting Solutions, Inc. seeks a Sr. Reliability Engineer based in San Francisco, California. This role requires a Bachelor's in Mechanical Engineering and over 8 years of experience in a chemical plant or refinery. The successful candidate will conduct root...Senior- ...About the Role We're looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability, and operational excellence at the core. You'll partner with engineers and data scientists to build, automate, and maintain...Senior
- A leading biotechnology firm in South San Francisco is seeking a Site Reliability Engineer to architect and implement Infrastructure as Code (IaC) solutions that enhance cloud-based platform solutions for Machine Learning and HPC workloads. The ideal candidate has extensive...Senior3 days per week
- ...landscape. The Role You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack product team running... ...Own the infrastructure underpinning the Data Replication platform - Kubernetes clusters, CI/CD pipelines, secrets management,...SeniorLocal area
$261k - $326k
A technology company specializing in AI infrastructure is seeking a Principal Engineer to enhance reliability and scalability of cloud systems. This role demands over 15 years of experience in production engineering or related fields and involves setting technical directions...Senior- ...About the job Senior Site Reliability Engineer About the Company Stellar is a decentralized, public blockchain that gives developers the tools to create experiences that are more like cash than crypto. The network is faster, cheaper, and far more energy-efficient...Senior
$160k - $250k
...machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS... ...secure infrastructure Manage a diverse array of technology platforms, following best practices and procedures Participate in on-...Senior- ...algorithms that significantly outperforms individual engineers. We combine language models with human ingenuity to push... ...The Role: We are seeking an experienced Site Reliability Engineer to join our Platform Engineering team in the Bay Area. You'll be instrumental...Senior
- A leading language learning platform is seeking an experienced SRE Engineer to ensure the reliability and resilience of their infrastructure. Responsibilities include leading incident response, improving observability, and collaborating with various teams to enhance platform...Senior
$181.69k - $213.75k
...Senior Site Reliability Engineer San Francisco, California; Santa Clara, California; Seattle, WA The Company You'll Join Carta connects founders... ...Trusted by 65,000+ companies in 160+ countries, Carta's platform of software and services lays the groundwork so you can...SeniorFull timeWork at office- US Corp. is seeking a Lead Site Reliability Engineer to spearhead our mission of delivering highly available and performant systems. With an average of over 12 years of industry experience, the successful candidate will bridge the gap between software development and systems...Senior
- OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...SeniorFlexible hours
$195k - $240k
...Senior Site Reliability Engineer San Francisco (Hybrid) At You.com, we are building the AI Search Infrastructure that powers modern AI systems... ...-time, accurate, and citation-backed information. Our platform combines proprietary vertical indexes with LLM-optimized...SeniorFull timeImmediate startRemote workWork from homeFlexible hours- ...Udaip Cloud-Based Data And Ai Platform Engineer At U.S. Bank, we're on a journey to do our best. Helping the customers and businesses we serve to make better and smarter financial decisions and enabling the communities we support to grow and succeed. We believe it takes...SeniorTemporary workWork experience placement
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Platform Reliability Engineer. Be the first to apply!
- platform developer San Francisco, CA
- senior platform engineer San Francisco, CA
- platform engineering manager San Francisco, CA
- platform engineer San Francisco, CA
- client platform engineer San Francisco, CA
- data platform engineer San Francisco, CA
- network reliability engineer San Francisco, CA
- reliability maintenance engineering technician San Francisco, CA
- sr reliability engineer San Francisco, CA
- reliability engineer San Francisco, CA

