Staff Site Reliability Engineer, Cloud

$165k - $200k

Israelvcforum

Who we are Kentik is the network intelligence platform for modern infrastructure teams. Unlike traditional monitoring and observability tools, we demystify complex network operations, enabling organizations to deliver applications and innovation at scale. Built by network experts to make critical insight accessible to every engineer, Kentik is the real-time source of truth that understands every network in context — from data center to cloud to the internet. This single platform unifies and correlates cloud, device, flow, synthetic data to turn telemetry into action. Market leaders like Akamai, Booking.com, Dropbox, and Zoom rely on Kentik to run, manage, and optimize their networks. What we do Our platform ingests trillions of records and serves hundreds of thousands of queries for our users each day. You will gain experience building a production quality, high performance server-and-client SaaS application that handles uniquely high volumes of data. We have built a team of world-class engineers, network experts, and technology thought leaders in a remote-friendly culture from day one. While prior experience in a remote environment is not required, we highly value strong collaboration and communication skills, as well as a high level of independence and autonomy. What you'll do Kentik is looking for a Staff level Site Reliability Engineer (Cloud) to join our Product Engineering team to help build and maintain our Synthetics and Cloud product lines. These products have multiple applications deployed in various cloud providers all over the world. We manage these cloud applications using observability tooling, automated build processes, and adherence to configuration as code best practices. We’re looking for an experienced engineer who will work with engineering teams across the company to help grow our hardware and software infrastructure. We operate a well-organized, well-instrumented platform, and offer enormous opportunities for employee growth. Make sure our real-time, scalable, infrastructure is set up for growth and working efficiently. Our infrastructure runs on our own hardware, across multiple locations as well as all major cloud vendors Work on tools and processes to better monitor our platform as well as ensuring its stability through our rapid growth Deep-diving into diverse topics, from firewalls and IP routing, to database replication strategies or automating build processes Collaborate with engineering and infrastructure teams on finding solutions from an operational perspective Assist with expanding our cloud deployments across the major cloud providers Contribute code, code reviews and tools or patches to all kinds of existing code Write design documents or collaborate on colleagues’ docs to introduce new features or changes into our infrastructure Provide valuable feedback on team goals, projects, and processes. We believe in continuously improving our team What you'll bring Studies have shown that some candidates tend to apply to jobs only if they meet 100% of the qualifications. We encourage you to apply if you meet most of the criteria - even if you don’t match all of the qualifications, your skills and experience could be valuable in this role! 8+ years of experience in cloud-based Systems Administration, IT and/or SRE related projects Expertise in public cloud environments such as AWS, GCP, Azure, or OCI. Strong command of containerization and orchestration using Docker and Kubernetes. Solid programming and automation skills using Bash, Python, or Go. Proficiency with Infrastructure as Code (IaC) and configuration management platforms such as Terraform, Ansible, and Puppet. Proficiency in Linux administration and command-line tools (e.g., SSH, grep, awk). Detailed understanding of major internet protocols (TCP/IP, DNS, TLS) Networking administration experience: concepts such as routing, firewalls (iptables), peering sound familiar A passion for documenting code, processes, and infrastructure in runbooks and wikis Worked with metrics monitoring solutions such as grafana, prometheus, telegraf, and OpenTelemetry Experience creating and managing tickets with third party vendors and owning cloud vendor partner relationships Nice to haves: Familiarity with Kubernetes automation tools, specifically managing complex deployments with Helm and Helmfile. Knowledge of scaling Kubernetes workloads and compute infrastructure Experience optimizing CI/CD build and deploy pipelines using GitHub Actions and Jenkins. Exposure to PagerDuty Integrations Knowledge of SRE, DevOps and GitOps practices and principles Our tech stack Our core data engine and platform are primarily written in Go We use Node.js + Express for application serving, and React as our primary UI framework We also use some JS and Python for tooling/scripting In addition to our own database, we use Postgres, Kafka, Mysql, and Redis Internal and public APIs expose both rest/json and gRPC endpoints Haproxy, Envoy for API traffic routing and balancing Github for source control, PRs, issues Jenkins for automated builds What we offer Kentik is a fully remote company that operates globally. We seek professionals that will help us thrive as an organization, and in turn, to broaden and enhance your career. We’re very thorough in the interview process to understand your skills and how they will relate to your successful growth here at Kentik. Our compensation philosophy encompasses a fair program for all in order to attract, engage and retain talented individuals who will drive our business and wow our customers. The compensation range for this position is: $165,000 - $200,000. This range reflects the low and high end of the U.S. compensation range Kentik reasonably and generally expects to pay the hired candidate in this role. The actual compensation offered may be lower or higher than the stated range depending on various factors, including but not limited to: Experience with the skill sets required for success Demonstrated competencies and potential A geographic market-based approach In addition to a great career opportunity, Kentik offers stellar benefits for our employees, which include: 100% of premiums are paid by company for health, vision and dental coverage for you and your dependents Additionally, an annual Health Reimbursement Account (HRA) of $3,000 for an individual or $4,500 for a family Paid family & medical leave Open PTO, a quarterly Wellness Day, and a minimum of 10 paid holidays 401(k) retirement account Home office reimbursement Stock options Note: Benefits are as listed for all US full-time employees. For compensation, international applicants will be treated equitably in relation to the laws applicable within the countries in which we operate. Come work with us The true meaning of Kentik is visibility . We’re committed to making sure everyone feels empowered to use their voice, has a sense of belonging, and is represented at Kentik. We don’t look for individuals who fit the culture, but those who will continue to add to the culture. We encourage everyone to apply, especially those individuals who are underrepresented in the industry: people of color, LGBTQI+ community, women, individuals with disabilities (both seen and unseen), veterans, and people of any age or family status. Kentik is committed to creating an inclusive interview process. If you require a reasonable accommodation during the application or interview process, please reach out to View email address on click.appcast.io. Come as you are! You will be working at a fast-growing, well-funded startup alongside industry thought leaders and network aficionados as we build the future of observability and set the high bar for how network operations and digital businesses should run. With a competitive salary and amazing benefits on top of the meaningful and challenging projects you’ll take on, we’re sure you’ll enjoy joining the Kentik team.

li-remote
J-18808-Ljbffr Israelvcforum

Apply

Vacancy posted 3 hours ago

Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer, Cloud in San Francisco, CA vacancy

Senior Manager, Site Reliability Engineering - Infrastructure Platform
$232k - $319k
...scale the service with great people and reliable, cost-effective, and efficient infrastructure... ...partnership with architects and product engineering Build a world-class observability... ...and operation of scalable, self-service Cloud infrastructure platforms (e.g.,...
Suggested
Permanent employment
Local area
Worldwide
Flexible hours
Okta, Inc.
San Francisco, CA
13 hours ago
Site Reliability Engineer, Infrastructure - Analytics Platform
...size of our workloads, while remaining reliable and easy to use. About the Role We're looking for an experienced Site Reliability Engineer to own production-critical infrastructure... ...experience with Kubernetes, Terraform, and cloud infrastructure. Excellent communication...
Suggested
OpenAI
San Francisco, CA
13 hours ago
Senior Site Reliability Engineer - Remote, Multi-Cloud
$180k - $210k
...Employment Type Full time Location Type Remote Department Tech Engineering Compensation $180K - $210K • Offers Equity The base salary &... ...stably and scale effectively. You will work across both cloud and on‑premise environments, developing robust system architectures...
Suggested
Remote job
Full time
H1b
Work at office
Worldwide
Visa sponsorship
Flexible hours
Twelve Labs
San Francisco, CA
3 days ago
Cloud-Native Site Reliability Engineer | Kubernetes & AWS
$125k - $165k
A leading innovator in laboratory software is seeking a Site Reliability Engineer in San Francisco, CA. The role focuses on ensuring reliability... ...production infrastructure, and operating resilient systems in cloud environments. The candidate should have extensive...
Suggested
TELCOR
San Francisco, CA
3 days ago
Senior Site Reliability Engineer - AI Cloud & GPU Infra
A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...
Suggested
Hyperbolic Labs
San Francisco, CA
2 days ago
Senior / Staff Site Reliability, Platform Engineering
...platform runs on complex, distributed, cloud‑native systems. As a Staff Platform Engineer, you will play a critical role in... ...leadership role. You will own reliability for major platform domains,... ...Development, Platform Engineering, or Site Reliability Engineering role,...
Saviynt
San Francisco, CA
13 hours ago
Site Reliability Engineer (SRE) / DevOps Engineer
$210k - $300k
...Site Reliability Engineer (SRE) / DevOps Engineer Location: Onsite in NYC or San Francisco Compensation: $210,000–$300,000 Base Salary... ...Engineer to help build, scale, and operate highly reliable cloud infrastructure and developer platforms. In this role, you will...
TechLine Consulting
San Francisco, CA
2 days ago
Senior Site Reliability Engineer — Cloud Infra Lead
Airwallex- is seeking a Senior Site Reliability Engineer in San Francisco, California, to work with product teams to build and maintain robust cloud infrastructure. In this role, you will lead critical infrastructure projects, ensuring the reliability and performance of...
Airwallex-
San Francisco, CA
13 hours ago
Senior Site Reliability Engineer: Cloud Reliability Leader
Drata is seeking a Senior Site Reliability Engineer in San Francisco. In this role, you will engage in reliability architecture for product teams... ...candidate has at least 6 years of experience in SRE or Cloud Engineering, expertise in Terraform and Datadog, and is proficient...
Careers at Drata
San Francisco, CA
1 day ago
Staff + Sr. Software Engineer, Cloud Inference Launch Engineering
$320k
...Anthropic's mission is to create reliable, interpretable, and steerable... ...of committed researchers, engineers, policy experts, and business... .... About the Role The Cloud Inference team scales and... ...policy: Currently, we expect all staff to be in one of our offices...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
Sr. Site Reliability Engineer
$163k - $203k
...will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper’s Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role — you will maintain the applications that run on our...
Work experience placement
Work at office
Local area
Remote work
Flexible hours
2 days per week
Prosper.com
San Francisco, CA
4 days ago
Senior Site Reliability Engineer, Fleet Management
...The TeamPlatform Engineering is the department within SRE that is responsible for a range of... ...organization. Among these are our multi-cloud-provider Kubernetes infrastructure, networking... ...critical components that ensure cluster reliability and security (e.g., CoreDNS, cert-...
Work at office
Local area
Remote work
Worldwide
Flexible hours
MongoDB
San Francisco, CA
3 days ago
Senior Technology Site Reliability Engineer
$140k - $205k
...Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam... ...such as Python, Go, or Java ~ Deep expertise in cloud platforms, particularly AWS, and container orchestration...
Full time
Temporary work
Work at office
Flexible hours
Weekend work
Cooley
San Francisco, CA
3 days ago
Site Reliability Engineering
...Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative... ..., and product teams, and evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end...
Forhyre
San Francisco, CA
14 days ago
Site Reliability Engineer
$150k
...Description About The Role We are seeking an experienced Site Reliability Engineer (SRE) with a strong focus on DevSecOps to join our growing... ..., security posture, and operational hygiene of our cloud infrastructure, APIs, and software supply chain. You will drive...
VantageScore
San Francisco, CA
18 days ago
Site Reliability Engineer
...fast-growing, early-stage startup to identify a top-tier Site Reliability Engineer who will play a critical role in scaling and strengthening... ...Grafana Additional hands-on experience scaling systems in cloud-native environments Why Join Direct impact on a rapidly...
Velia multiservices
San Francisco, CA
14 days ago
Senior Manager, Site Reliability Engineering (FedRAMP) - ThousandEyes
$210.6k - $305.1k
...own. Powered by AI and an unmatched set of cloud, internet and enterprise network... ...~ You have led a distributed team of 5+ engineers, can demonstrate strong technical vision... ...insurance. Please see the Cisco careers site to discover more benefits and perks. Employees...
Full time
Temporary work
Local area
Flexible hours
Cisco
San Francisco, CA
4 days ago
Senior Manager, Site Reliability Engineering
$227.2k - $324.5k
...About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems....
Full time
Contract work
Temporary work
Local area
Flexible hours
Tubi
San Francisco, CA
3 days ago
Staff Software Engineer, Site Reliability Engineer
$238k - $290k
...we're just getting started. Role Overview As a Staff Software Engineer on the Site Reliability team at Harvey, you will ensure the reliability, scalability... ...(PagerDuty, IncidentIO, etc.) ~ Proficiency with cloud infrastructure platforms (Azure, GCP, AWS, etc.) ~...
Relocation package
Harvey
San Francisco, CA
13 hours ago
Senior Site Reliability Engineer
$60 per hour
Senior Site Reliability Engineer (Copy) Seattle Hybrid (Hybrid location). Full-time. About Us Supio is a trusted AI platform purpose-built for law firms, reshaping how data drives impactful outcomes. Our innovative approach blends technology with deep legal expertise,...
Full time
Work at office
Flexible hours
Bonfirevc
San Francisco, CA
13 hours ago
Site Reliability Engineer
$175k - $250k
...fast‑growing customer base of SaaS companies. About the Site Reliability Engineering Team The Site Reliability Engineering (SRE) team ensures the... ...Experience operating and scaling production systems in cloud environments (we use AWS) Familiarity with service reliability...
Remote work
I did my part and supported the Regular Toilet
San Francisco, CA
13 hours ago
Senior Site Reliability Engineer
# Senior Site Reliability EngineerHybrid - San Francisco**Our Mission & Values... ...operates as both a central engineering function and an embedded... ...You'll work across a modern cloud-native stack to help Drata scale... ...engineering leads and staff engineers to define SLOs and...
Work at office
Immediate start
Worldwide
Monday to Friday
Flexible hours
Careers at Drata
San Francisco, CA
1 day ago
Hyperbolic Labs - Senior Site Reliability Engineer
...down the barriers to computing power with our Open-Access AI Cloud. By aggregating computing resources across the globe, we... ...poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI...
deCircle
San Francisco, CA
4 days ago
Senior Site Reliability Engineer
$166.9k - $225.9k
...operates as both a central engineering function and an embedded reliability practice. You'll be part... ...'ll work across a modern cloud‑native stack to help... ...product engineering leads and staff engineers to define SLOs... ...+ years of experience in Site Reliability Engineering,...
Flexible hours
Drata
San Francisco, CA
1 day ago
Senior Site Reliability Engineer - Observability
Lambda, The Superintelligence Cloud, is a leader in AI cloud... ...home day is currently Tuesday. Engineering at Lambda is responsible for... ...adoptable and improve product reliability. Lead members of other engineering... ...5+ years of experience in Site Reliability Engineering practices...
Work at office
Local area
Work from home
Lambda
San Francisco, CA
4 days ago
Site Reliability Engineer
...About the role We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product... ...experience operating production systems in cloud environments, ideally AWS. Hands‑on experience with...
Work at office
Remote work
Flexible hours
2 days per week
Plenful
San Francisco, CA
1 day ago
Senior Site Reliability Engineer - AI-Driven, Scalable Infra
OutSystems, Inc. is looking for a Site Reliability Engineer to join their team in San Francisco, CA. The ideal candidate will lead the onboarding of services and teams to reliability tenets while establishing SLOs and SLAs. Proficiency in Python and experience with Kubernetes...
Flexible hours
OutSystems, Inc.
San Francisco, CA
13 hours ago
Senior Site Reliability Engineer
...customer acquisition, and Connor was a machine learning research engineer at Scale AI. The rest of our team comes from companies like... ...of-the-art AI. As a Senior SRE, you'll tackle the scaling and reliability challenges that come with adding terabytes of data monthly and...
Unify
San Francisco, CA
13 hours ago
Senior Site Reliability Engineer
$165k - $225k
...and changing Stellar ecosystem. SDF is looking for a Senior Site Reliability Engineer to help build and operate the foundation that powers our... ...solutions. Qualifications 5+ years of experience of working in cloud-based systems operations, as a SRE or DevOps engineer....
Temporary work
Work at office
Local area
Worldwide
Flexible hours
Stellar
San Francisco, CA
3 days ago
Site Reliability Engineer
$125k - $165k
Position Site Reliability Engineer Location Lincoln, NE, San Francisco, CA, or Remote Job ID 434 Openings 1 Job Summary The Site Reliability... ...role will also design and operate resilient systems across cloud and containerized environments, and manage production infrastructure...
Temporary work
Remote work
Visa sponsorship
Work visa
Flexible hours
TELCOR Inc
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer, Cloud. Be the first to apply!