Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Site Reliability Engineer

$180k - $220k
Full-time

Gradle Technologies

Job Description

Job Description

Who We Are

AI is changing how software gets built. Code production is becoming a commodity. The focus is shifting from writing code to orchestrating, verifying, and governing change – and the toolchain is the new constraint.

Gradle is at the center of this shift. We build Develocity, a toolchain observability and intelligence platform used by some of the world's leading software organizations – Netflix, Airbnb, Spotify, SAP, major global banks, and hundreds more. Develocity helps software teams achieve delivery excellence through deep observability, build and test acceleration, and AI-powered intelligence across the entire toolchain – with current support for Gradle Build Tool, Apache Maven™, sbt, npm, and Python.

We are an AI-native company. AI is not a feature we're bolting on – it's central to how we work, how we think about our product, and where we're heading. We're investing deeply in making Develocity's unique data and decades of domain expertise accessible to both humans and AI agents, with trust, evidence, and explainability at the core of everything we build.

We have partnered with the Apache Software Foundation, the Commonhaus Foundation, the Micronaut Foundation, and other OSS projects such as Spring, Quarkus, Kotlin, JUnit, AndroidX, and many more to bring the values of Develocity also to the OSS Community.

Our Values

Seek to Understand: Everything starts with listening and understanding; we strive to understand diverse viewpoints, problems, and motivations. Before we take action, we ensure we truly grasp the challenges, perspectives, and goals.

Know the Why : We approach our work with a clear sense of purpose, ensuring every step is deliberate and focused. We take meaningful action with urgency, but never at the expense of thoughtful consideration.

Innovate & Iterate : We embrace challenges and are not afraid to try new things, even if they might fail. With a deep understanding and a clear purpose, we can develop creative, bold solutions to tackle challenges.

Own the Outcome: We are empowered to take initiative, and we maintain transparency in our work and its outcomes. When we execute, we take responsibility for our decisions, measure the success of our innovations, and learn from the results.

Who You Are

We're building a new SRE team and looking for founding members to help shape how we operate. As a Lead SRE, you'll be a technical and operational leader for reliability across Develocity. You'll help define our SRE vision, set standards for how we operate production services, and mentor other SREs as the team grows. This is a hands-on role with broad influence across engineering, cloud platform, and customer-facing teams.

The SRE team will be responsible for the reliability, performance, and availability of Develocity instances serving paying customers, open-source projects, and public-facing services, plus supporting infrastructure like artifact registries.

You'll work on our internally-built Cloud Application Platform, Kubernetes on AWS, and develop deep expertise in it. When incidents happen, you'll troubleshoot issues across the stack, from application to infrastructure. You'll collaborate with the Cloud Platform team to improve the tooling you depend on, and with engineering teams to build reliability into how we ship software. If you like automating things and hate doing the same task twice, you'll fit in well.

You'll be part of a distributed, remote-first team that values asynchronous communication and written documentation. Strong self-direction and clear communication across time zones are essential.

Responsibilities
  • Operate and maintain all Develocity instances and supporting services in production.
  • Define and evolve SRE standards, practices, and operating models, including on-call, incident response, postmortems, and SLOs.
  • Participate in a follow-the-sun on-call rotation, acting as a technical escalation point for complex or high-severity incidents.
  • Lead incident response and blameless retrospectives, ensuring learnings result in measurable reliability improvements.
  • Set reliability priorities using risk, customer impact, business goals, SLOs, and error budgets.
  • Identify systemic reliability risks and continuously evolve Develocity's SaaS operations as the platform and customer base grow.
  • Lead and influence architectural and design reviews to ensure reliability, scalability, and operability.
  • Drive automation across deployment, upgrades, monitoring, self-healing, recovery, and operational workflows.
  • Build and maintain comprehensive observability for all managed services, including logging, metrics, tracing, and alerting.
  • Own disaster recovery, backups, and business continuity planning and execution.
  • Partner with engineering leadership to balance feature delivery with reliability and operational excellence.
  • Mentor and coach SREs, supporting technical growth and strong operational practices.
  • Help onboard new SREs and contribute to hiring by defining and assessing SRE excellence at Develocity.
  • Communicate clearly with customers during incidents and maintenance windows.
  • Optimize performance, resource utilization, and operational costs.
Minimum qualifications
  • 7+ years in SRE, DevOps, or an equivalent role operating production services at scale.
  • Experience leading reliability initiatives across multiple teams or services.
  • Demonstrated ability to influence technical direction without direct authority.
  • Experience designing and operating systems with SLOs and error budgets, and exercising strong judgment in balancing reliability, velocity, and cost.
  • Strong Kubernetes experience in production environments.
  • Cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2).
  • Proficiency with observability tools (Prometheus, Grafana) and Infrastructure as Code (Terraform).
  • Track record of incident management and response in a 24/7 on-call environment.
  • Scripting proficiency (Python, Bash) for automation.
  • Strong written and verbal English communication skills.
Preferred qualifications
  • Experience as a founding or early SRE establishing practices in a growing SaaS organization.
  • Familiarity with Develocity.
  • JVM language experience (Java, Kotlin).
  • Experience with customer-facing and executive-level incident communications.
What We Offer
  • A ground-floor role in a new SRE team - you'll shape how we do things, not inherit someone else's decisions.
  • Real ownership of production systems used by engineers at companies you've heard of.
  • Direct interaction with customers when things go wrong (and when they go right).
  • A culture that values automation over heroics.
  • In-person meetings, such as our annual company offsite and team meetings.
  • Work from home in a remote-first environment.
  • Competitive salaries and equity grants.
Compensation

The US salary range for this position is $180-220k which reflects the target ranges for all US locations. Within this range, individual pay is determined by geographic location and additional factors including but not limited to experience, relevant skills, qualifications, seniority, performance, and travel requirements. Our recruiting team can share more information about the specific salary range for your location during the hiring process.

Location
  • Remote from anywhere in EST timezone.
  • While our team works remotely and is spread across the globe, we deeply value daily interactions and collaboration.

Vacancy posted a month ago
Similar jobs that could be interesting for youBased on the Staff Site Reliability Engineer in Atlanta, GA vacancy
  •  ...Position Purpose:The Senior Software Engineer for Site Reliability Engineering (Store Systems Enablement) builds and operates the internal platforms that keep HomeDepot's store systems observable, reliable, and automated. This is a platform engineering role: you will design... 
    Suggested
    Work experience placement
    Local area
    Remote work
    Shift work

    Home Depot

    Atlanta, GA
    4 days ago
  •  ...enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's...  ...is founder-led, profitable, and growing. We are hiring a Site Reliability Engineer Our goal is to perfect enterprise infrastructure DevOps... 
    Suggested
    Work at office
    Local area
    Remote work
    Work from home
    Worldwide

    Canonical

    Atlanta, GA
    1 hour ago
  • The Home Depot is hiring a Senior Software Engineer for Site Reliability in Atlanta, Georgia. This role focuses on enhancing product reliability and drives platform stability with automated solutions. Responsibilities include software development and deployment, collaborating... 
    Suggested

    The Home Depot

    Atlanta, GA
    2 days ago
  • $104k - $130k

     ...infrastructure as well as help improve the reliability, quality of services and overall...  ...recovery.  You’ll collaborate or embed with engineering teams, helping them to improve the...  ...more about our locations by visiting our site. Compensation & Benefits The base... 
    Suggested
    Full time
    Work experience placement

    AppFolio

    Atlanta, GA
    1 day ago
  •  ...smart growth" approach ensures that we will continue to scale our company effectively. Summary We are seeking a Lead Site Reliability Engineer to spearhead our SRE team. You are not just an operator; you are an experienced software engineer who excels at... 
    Suggested
    Remote work

    Intellum

    Atlanta, GA
    1 hour ago
  • You’re more valuable than ever - And that’s just how we’ll make you feel.The Site Reliability Engineer is responsible for maintaining and enhancing the reliability, security, and performance of our customer-facing web and mobile applications. This role is critical in remediating... 
    Work experience placement
    Work at office
    Local area

    GoHealth Urgent Care

    Atlanta, GA
    4 days ago
  • Who we’re looking for? A Site Reliability DevOps engineer working as part of the high-performing Operations team (SRE) growing their knowledge and skillset. Helps maintain existing business-critical applications and infrastructure while recommending technical and process... 
    Remote job
    Monday to Friday

    Braves Technologies

    Atlanta, GA
    5 days ago
  • Summary: As a Sr. Site Reliability Engineer, you are instrumental in helping make our client’s Kubernetes-centric ProArchive application resilient. This position will coordinate with multiple teams to develop a migration plan for various components and services as well... 

    Lexicon Solutions

    Atlanta, GA
    5 days ago
  •  ...- AWS, Google Cloud, and Azure is a plus - CI/CD Automation, Database Management. The Technical Support Specialist in Site Reliability Engineering (SRE) will be responsible for ensuring the reliability and stability of the systems and applications. The role involves... 

    TechDigital Group

    Atlanta, GA
    5 days ago
  •  ...tooling that improve cloud infrastructure reliability, scalability, and operational efficiency...  ...platforms and tools that enable engineering teams to provision services rapidly, consistently...  ...engineering, cloud infrastructure, or site reliability engineering. Experience... 

    Axon Enterprise

    Atlanta, GA
    4 days ago
  • Job Title :- Site Reliability Engineer (SRE) Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 Location :- Atlanta, GA (Onsite) Job Description We are seeking a highly skilled Site Reliability Engineer (SRE)with expertise... 

    Highbrow LLC

    Atlanta, GA
    2 days ago
  • $117k - $209.33k

    ## Site Reliability EngineerApplylocations: Atlanta, GA, USAtime type: Full timeposted on: Posted Todayjob requisition id: 26WD98046**Job Requisition...  ...exciting new opportunity has opened for a Site Reliability Engineer within the Autodesk PDMS Platform SRE team. The successful... 
    Permanent employment

    Autodesk, Inc.

    Atlanta, GA
    3 days ago
  • GoHealth Urgent Care is hiring a Site Reliability Engineer in Atlanta, Georgia. This role focuses on maintaining and enhancing the reliability, security, and performance of web and mobile applications. You will be responsible for managing Azure DevOps pipelines and collaborating... 

    GoHealth Urgent Care

    Atlanta, GA
    4 days ago
  •  ...distributed, cloud-native systems. As a Staff Platform Engineer, you will play a critical role in...  ...technical leadership role. You will own reliability for major platform domains, design scalable...  ...Development, Platform Engineering, or Site Reliability Engineering role, with a... 
    Full time

    Saviynt

    Atlanta, GA
    10 days ago
  • Site Reliability Engineering (SRE) Architect Location: Atlanta, GA Duration: 12Months+ Extension Hourly Rate: Depending on Experience (DOE) Key Responsibilities Reliability Strategy & Design: Architect and design highly available, scalable, secure, and cost-effective... 
    Hourly pay
    Permanent employment
    Contract work
    Local area
    Early shift

    Robotics Prcocess Automation, LLC

    Atlanta, GA
    4 days ago
  • About the Role You\'ll own the reliability posture of a large-scale healthcare platform. That...  ...-ready. You\'ll work alongside software engineers and security engineers who are building...  ...operational risk is on the table Mentor Staff-level engineers - raise the floor on how... 
    Permanent employment
    Flexible hours

    Satine Technologies

    Atlanta, GA
    1 day ago
  • $135.8k - $183.8k

     ...Postgres DBs in support of key services that make the internet work. The ideal candidate will work with other DBA SREs, application engineers, Infrastructure teams, Security and Project Managers maintaining critical internet infrastructure. Responsibilities Maintain and... 
    Work experience placement
    Work at office
    Flexible hours

    The Association of Technology, Management and Applied Engine...

    Atlanta, GA
    5 days ago
  • $300k - $360k

     ...giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.As a Director of Site Reliability Engineering, you will own execution for reliability, availability, and operational excellence across Affirms global platform. You... 
    Work at office
    Remote work
    Flexible hours

    Affirm

    Atlanta, GA
    3 days ago
  • $130k - $150k

     ...You'll work alongside software engineers and security engineers who...  ...and improve CI/CD pipelines - reliability, deployment safety, rollback...  ...figure it out together. Senior Site Reliability Engineer Salary:...  ...a clear growth path toward Staff scope. The team is small enough... 
    Permanent employment
    Flexible hours

    Satine Technologies

    Atlanta, GA
    4 days ago
  • A leading IT solutions provider in Atlanta is looking for a Site Reliability Engineer (SRE) with expertise in Adobe Experience Manager (AEM) and DevOps practices. The successful candidate will maintain and enhance the reliability of AEM applications while implementing scalable... 

    Highbrow LLC

    Atlanta, GA
    2 days ago
  • Autodesk, Inc. is seeking a Site Reliability Engineer based in Atlanta, GA. This role involves architecting solutions for SaaS applications, managing cloud infrastructure, and ensuring reliability and performance. Candidates should have a background in DevOps, strong AWS... 

    Autodesk, Inc.

    Atlanta, GA
    3 days ago
  •  ...Opportunity Euna Solutions is looking for a Staff AI Developer to help define and drive...  ...workflows If you're an AI-forward engineer who enjoys solving complex problems,...  ...identify root causes, and improve platform reliability Implement AI-powered features including... 
    Work at office
    Local area
    Flexible hours
    Weekend work

    Euna Solutions

    Atlanta, GA
    1 day ago
  • An innovative firm is seeking a Technical Support Specialist to join their Site Reliability Engineering team. This role is pivotal in ensuring the reliability and stability of systems and applications. You will provide technical support, troubleshoot issues, and implement... 

    TechDigital Group

    Atlanta, GA
    5 days ago
  •  ...Guest Check-in Staff We are seeking enthusiastic and dedicated Guest Check-in Staff to become an essential part of our team. In this role, you will be the first point of contact for guests arriving at our establishment, making it crucial for you to create a warm and... 
    Local area
    Flexible hours
    Night shift

    Emerald Logistix

    Atlanta, GA
    3 days ago
  • $10 - $14.5 per hour

     ...Perimeter Pointe 10 - Floor Staff - $14.50 Per Hour Job Category: Floor Staff - Theatre Requisition Number: PERIM038812 Part-Time Perimeter Pointe Stadium 10 Atlanta, GA 30338, USA Description Summary: Floor Staff team members are classified based on individual... 
    Hourly pay
    Full time
    Part time
    Work at office
    Local area

    Regal Entertainment

    Atlanta, GA
    1 day ago
  • $16 per hour

     ...Summary: Floor Staff team members are classified based on individual theatre needs, and/or employee availability, as either variable hour, part-time fixed, part-time regular or full-time hourly employees whose primary responsibility is ensuring our guests receive exceptional... 
    Hourly pay
    Full time
    Part time
    Work at office
    Local area
    Night shift
    Weekend work

    Regal Entertainment

    Atlanta, GA
    2 days ago
  •  ...Position: Release Engineering Contractor Only locals Location: Alpharetta, GA 30005 Job type: Contract Position Overview We are seeking an exceptional Release Engineering contractor to join our team and drive continuous improvement in... 
    Contract work
    For contractors
    Local area

    Equiliem

    Atlanta, GA
    4 days ago
  • $94.9k - $135.6k

     ...development, testing, operations, and platform teams to deliver value safely and efficiently. Cardinal Health is seeking a Release Engineer to lead iteration and release management activities supporting mission critical warehouse transformation initiatives on Program... 
    Temporary work
    Local area
    Immediate start
    Flexible hours

    Cardinal Health

    Atlanta, GA
    2 days ago
  • $10 per hour

    I am a homeowner in Atlanta, Georgia, looking to hire a reliable housekeeper to assist with various household tasks. The ideal candidate will help keep my home clean and organized. Duties will include cleaning bathrooms, washing dishes, and tidying up general rooms. A... 
    Hourly pay

    Housekeeper.com

    Atlanta, GA
    13 days ago
  • $8 per hour

    Hello, I’m Isaac, a homeowner in Atlanta, Georgia, looking for a reliable housekeeper to help keep my home clean and organized. I need someone who can handle a variety of tasks around the house, focusing on different areas to ensure everything is spotless. Your main responsibilities... 
    Hourly pay
    Immediate start

    Housekeeper.com

    Atlanta, GA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Site Reliability Engineer. Be the first to apply!