Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Lead Site Reliability Engineer

Intellum

Job Description Job Description About us Intellum is the leader in corporate education technology and powers the largest, most successful customer, partner, and employee learning programs in the world. Large brands and fast-moving companies like Google, Meta, Amazon, Walmart, Xero, Atlassian, Mailchimp, Airbnb, Stripe, and TikTok rely on Intellum to engage and educate the audiences they touch. We have always been a "remote first" company and are proud to have team members located all over the world. We value Curiosity, Creativity, Perseverance, and Kindness and strive to demonstrate these core values every day. Our culture is very important to us. We invest in our people in fun and exciting ways, including personal development budgets and an annual all-company retreat that is focused less on work and more on human connections. We are in growth mode, and our "smart growth" approach ensures that we will continue to scale our company effectively. Summary We are seeking a Lead Site Reliability Engineer to spearhead our SRE team. You are not just an operator; you are an experienced software engineer who excels at architecture, code optimization, and deep troubleshooting. In this role, you will drive operational maturity by defining our reliability standards (SLOs), hardening our security posture (WAF/InfraSec), and scaling the Intellum platform. Our stack Core : Applications written in Ruby on Rails and Node.js, PostgreSql, MongoDB,, Redis, Memcached, Sidekiq, ActiveJob, Elasticsearch, Websockets Infrastructure : 100% Linux-based cloud infrastructure (AWS, Google Cloud, MongoDB Atlas) and services (ECS/EC2/Kubernetes, Elasticache, MemoryStore, RDS, CloudSQL, BigQuery etc.) Infrastructure as Code (IaC) : GitHub, Terragrunt, Terraform, Ansible CI/CD: Spinnaker, Jenkins Observability & Alerting : New Relic, AWS CloudWatch, Google Cloud Stackdriver, Squadcast Agile/Scrum practices utilizing JIRA Responsibilities SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives. Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience. Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department. Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline. Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence. Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it". Required Skills Experience & Engineering 10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications. Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible). Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases. SRE & Operations Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals". SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability. Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security. Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection). Leadership Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills. Documentation & Training: Skilled in documenting solutions and training operational teams on how to effectively support and maintain systems. Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion. Bonus Skills Automation Tools: Experience in developing solutions using server automation tools such as Terraform, Ansible. CI/CD Expertise: Experience in writing and maintaining CI/CD pipelines and services. Kubernetes: Experience in building, deploying, and optimizing Kubernetes-based infrastructure Perimeter Defense: Experience configuring and managing Web Application Firewalls (WAF) (e.g., Cloudflare, AWS WAF, Akamai) and DDOS protection mechanisms. Education Bachelor's degree in Computer Science or related technical field BENEFITS Medical - 100% of employee premiums for selected individual plans Dental - 100% of employee premiums covered Vision - 100% of employee premiums covered LinkedIn Learning 401(k) plus matching (US Based Only) Unlimited PTO Calm subscription Annual Company Retreat Intellum is an equal-opportunity employer. We're committed to building an inclusive team that celebrates diversity in people, perspectives, and backgrounds regardless of race, color, national origin, gender, sexual orientation, age, religion, disability, citizenship, veteran status, or any other protected status. We encourage you to apply for an open position and if you have questions about whether or not your job experience and skill set meet the requirements for a specific role, reach out to us directly at View email address on click.appcast.io. If you are an individual applying from CA, NY, CO, CT, MD, NV, or RI, please reach out to View email address on click.appcast.io to inquire about specific pay ranges.

Vacancy posted 8 hours ago
Similar jobs that could be interesting for youBased on the Lead Site Reliability Engineer in Atlanta, GA vacancy
  • The Home Depot is hiring a Senior Software Engineer for Site Reliability in Atlanta, Georgia. This role focuses on enhancing product reliability and drives platform stability with automated solutions. Responsibilities include software development and deployment, collaborating... 
    Suggested

    The Home Depot

    Atlanta, GA
    2 days ago
  •  ...Position Purpose:The Senior Software Engineer for Site Reliability Engineering (Store Systems Enablement) builds and operates the internal platforms...  ...in on-call rotation for observability infrastructure. Lead and contribute to blameless post-mortems. Design and execute... 
    Suggested
    Work experience placement
    Local area
    Remote work
    Shift work

    Home Depot

    Atlanta, GA
    4 days ago
  •  ...Description Job Description Canonical is a leading provider of open source software and...  ...as public cloud, data science, AI, engineering innovation, and IoT. Our customers...  ...profitable, and growing. We are hiring a Site Reliability Engineer Our goal is to perfect... 
    Suggested
    Work at office
    Local area
    Remote work
    Work from home
    Worldwide

    Canonical

    Atlanta, GA
    8 hours ago
  • $104k - $130k

     ...infrastructure as well as help improve the reliability, quality of services and overall...  ...recovery.  You’ll collaborate or embed with engineering teams, helping them to improve the...  ...more about our locations by visiting our site. Compensation & Benefits The base... 
    Suggested
    Full time
    Work experience placement

    AppFolio

    Atlanta, GA
    1 day ago
  •  ...About the RoleYou'll own the reliability posture of a large-scale healthcare platform. That...  ...production-ready. You'll work alongside software engineers and security engineers who are building...  ...in on-call rotation and lead incident response for platform issuesPartner... 
    Suggested
    Permanent employment

    Satine Technologies

    Atlanta, GA
    1 day ago
  • $180k - $220k

     ...intelligence platform used by some of the world's leading software organizations – Netflix,...  ...a technical and operational leader for reliability across Develocity. You'll help define...  ...-on role with broad influence across engineering, cloud platform, and customer-facing... 
    Full time
    Remote work
    Work from home
    Shift work

    Gradle Technologies

    Atlanta, GA
    a month ago
  • GoHealth Urgent Care is hiring a Site Reliability Engineer in Atlanta, Georgia. This role focuses on maintaining and enhancing the reliability, security, and performance of web and mobile applications. You will be responsible for managing Azure DevOps pipelines and collaborating... 

    GoHealth Urgent Care

    Atlanta, GA
    4 days ago
  • $117k - $209.33k

    ## Site Reliability EngineerApplylocations: Atlanta, GA, USAtime type: Full timeposted on: Posted Todayjob requisition id: 26WD98046**Job Requisition...  ...exciting new opportunity has opened for a Site Reliability Engineer within the Autodesk PDMS Platform SRE team. The successful... 
    Permanent employment

    Autodesk, Inc.

    Atlanta, GA
    3 days ago
  • $164.3k - $222.3k

     .... This position is based in our Reston, VA office and offers a hybrid work schedule. Verisign is hiring a Senior Site Reliability Engineer to help lead a team responsible for building, managing, maintaining, and scaling the Linux infrastructure on which our mission‑critical... 
    Work at office
    Flexible hours

    The Association of Technology, Management and Applied Engine...

    Atlanta, GA
    2 hours ago
  •  ...valuable than ever - And that’s just how we’ll make you feel.The Site Reliability Engineer is responsible for maintaining and enhancing the...  ...resolution, performing root cause analysis and minimizing downtime.Lead efforts to identify, remediate, and document security... 
    Work experience placement
    Work at office
    Local area

    GoHealth Urgent Care

    Atlanta, GA
    4 days ago
  • Who we’re looking for? A Site Reliability DevOps engineer working as part of the high-performing Operations team (SRE) growing their knowledge and skillset. Helps maintain existing business-critical applications and infrastructure while recommending technical and process... 
    Remote job
    Monday to Friday

    Braves Technologies

    Atlanta, GA
    12 hours ago
  • Job Title :- Site Reliability Engineer (SRE) Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 Location :- Atlanta, GA (Onsite) Job Description We are seeking a highly skilled Site Reliability Engineer (SRE)with expertise... 

    Highbrow LLC

    Atlanta, GA
    2 days ago
  • Summary: As a Sr. Site Reliability Engineer, you are instrumental in helping make our client’s Kubernetes-centric ProArchive application resilient...  ...the most popular communications platforms and the world’s leading cloud infrastructure platforms. They use the latest in AI/... 

    Lexicon Solutions

    Atlanta, GA
    12 hours ago
  •  ...improve cloud infrastructure reliability, scalability, and operational...  ...code in Go, Python, or similar. Lead design reviews and set code...  ...and tools that enable engineering teams to provision services rapidly...  ...engineering, cloud infrastructure, or site reliability engineering.... 

    Axon Enterprise

    Atlanta, GA
    4 days ago
  •  ...- AWS, Google Cloud, and Azure is a plus - CI/CD Automation, Database Management. The Technical Support Specialist in Site Reliability Engineering (SRE) will be responsible for ensuring the reliability and stability of the systems and applications. The role involves... 

    TechDigital Group

    Atlanta, GA
    12 hours ago
  •  ...cloud-native systems. As a Staff Platform Engineer, you will play a critical role in...  ...technical leadership role. You will own reliability for major platform domains, design scalable...  ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a... 
    Full time

    Saviynt

    Atlanta, GA
    10 days ago
  • $135.8k - $183.8k

     ...Postgres DBs in support of key services that make the internet work. The ideal candidate will work with other DBA SREs, application engineers, Infrastructure teams, Security and Project Managers maintaining critical internet infrastructure. Responsibilities Maintain and... 
    Work experience placement
    Work at office
    Flexible hours

    The Association of Technology, Management and Applied Engine...

    Atlanta, GA
    12 hours ago
  • Overview Site Reliability Engineering (SRE) Architect Location: Atlanta, GA Duration: 12 Months+ Extension Hourly Rate: Depending on Experience...  ...behaviour Reliability Strategy & Design: With overall maturity lead the definition and implementation strategy for Service... 
    Hourly pay
    Permanent employment
    Contract work
    Local area
    Early shift

    Quantum Technologies. LLC

    Atlanta, GA
    1 day ago
  • $300k - $360k

     ...without any hidden fees or compounding interest.As a Director of Site Reliability Engineering, you will own execution for reliability, availability, and...  ...risks across the organization.You will hire, grow, and lead a diverse, global team of SREs, systems engineers, and full... 
    Work at office
    Remote work
    Flexible hours

    Affirm

    Atlanta, GA
    3 days ago
  • $130k - $150k

     ...You'll work alongside software engineers and security engineers who...  ...and improve CI/CD pipelines - reliability, deployment safety, rollback...  ...Participate in on‑call rotation and lead incident response for...  ...figure it out together. Senior Site Reliability Engineer Salary:... 
    Permanent employment
    Flexible hours

    Satine Technologies

    Atlanta, GA
    4 days ago
  • A leading IT solutions provider in Atlanta is looking for a Site Reliability Engineer (SRE) with expertise in Adobe Experience Manager (AEM) and DevOps practices. The successful candidate will maintain and enhance the reliability of AEM applications while implementing scalable... 

    Highbrow LLC

    Atlanta, GA
    2 days ago
  • $10 - $13 per hour

     ...We are seeking a Lead Java Integrator to design and build APIdriven integration services that expose and orchestrate enterprise data...  ...efforts across enterprise platforms and systems Ensure API reliability, performance, and data integrity Set and enforce coding standards... 

    Insight Global

    Atlanta, GA
    3 days ago
  •  ...Software Engineer Lead We are FIS. Our technology powers the world's economy and our teams bring innovation to life. We champion diversity...  ...with product, quality, and operations teams to deliver reliable releases Identify and resolve performance, reliability, and... 

    Fisglobal

    Atlanta, GA
    12 hours ago
  • $139.74k - $209.62k

     ...Platform Engineer Lead PLEASE NOTE: This position is not eligible for current or future visa sponsorship Location : This role requires associates to be in-office 1 - 2 days per week, fostering collaboration and connectivity, while providing flexibility to support... 
    Temporary work
    Work experience placement
    Work at office
    Local area
    2 days per week
    1 day per week

    Elevance Health

    Atlanta, GA
    3 days ago
  • Autodesk, Inc. is seeking a Site Reliability Engineer based in Atlanta, GA. This role involves architecting solutions for SaaS applications, managing cloud infrastructure, and ensuring reliability and performance. Candidates should have a background in DevOps, strong AWS... 

    Autodesk, Inc.

    Atlanta, GA
    3 days ago
  • $79.4k

     ...Application Developer, Lead Georgia State University's Instructional Innovation and Technology (IIT) division is seeking a highly skilled...  ...-on experience will be considered. NOTE: This role requires on-site work. Remote or hybrid work options are not available for this... 
    Full time
    Work at office
    Remote work
    Shift work

    Georgia State University

    Atlanta, GA
    1 day ago
  • Lead SCADA / HMI Systems Integrator - Ignition HMI/SCADA Lead SCADA / HMI Systems Integrator - Ignition HMI/SCADA A growing engineering and technology solutions firm is seeking an experienced SCADA / HMI Systems Integrator to join its expanding automation team supporting... 
    Flexible hours

    Liberty Personnel Services, Inc.

    Atlanta, GA
    3 days ago
  • $70k - $90k

    Technology Partner is looking for a Lead Security Integration Technician in Atlanta, GA. The ideal candidate will have over four years of installation or service experience with access control, CCTV/IP video, and intrusion systems. Responsibilities include leading field... 
    For subcontractor

    Technology Partner

    Atlanta, GA
    2 days ago
  • $70k - $90k

    Technology Partner, LLC is seeking a Lead Security Integration Technician in Atlanta, GA. The ideal candidate will have over 4 years...  ..., benefits, and opportunities for career advancement in systems engineering or project management. #J-18808-Ljbffr Technology Partner, LLC

    Technology Partner, LLC

    Atlanta, GA
    3 days ago
  •  ...environments. We are currently conducting a confidential search for a Lead AI Governance & Cloud Architect to help architect and...  ...~7+ years of experience in cloud architecture, infrastructure engineering, or platform engineering ~ Proven experience deploying and governing... 
    Temporary work

    TRC Talent Solutions

    Sandy Springs, GA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Lead Site Reliability Engineer. Be the first to apply!