Lead Site Reliability Engineer
Intellum
Job Description Job Description About us Intellum is the leader in corporate education technology and powers the largest, most successful customer, partner, and employee learning programs in the world. Large brands and fast-moving companies like Google, Meta, Amazon, Walmart, Xero, Atlassian, Mailchimp, Airbnb, Stripe, and TikTok rely on Intellum to engage and educate the audiences they touch. We have always been a "remote first" company and are proud to have team members located all over the world. We value Curiosity, Creativity, Perseverance, and Kindness and strive to demonstrate these core values every day. Our culture is very important to us. We invest in our people in fun and exciting ways, including personal development budgets and an annual all-company retreat that is focused less on work and more on human connections. We are in growth mode, and our "smart growth" approach ensures that we will continue to scale our company effectively. Summary We are seeking a Lead Site Reliability Engineer to spearhead our SRE team. You are not just an operator; you are an experienced software engineer who excels at architecture, code optimization, and deep troubleshooting. In this role, you will drive operational maturity by defining our reliability standards (SLOs), hardening our security posture (WAF/InfraSec), and scaling the Intellum platform. Our stack Core : Applications written in Ruby on Rails and Node.js, PostgreSql, MongoDB,, Redis, Memcached, Sidekiq, ActiveJob, Elasticsearch, Websockets Infrastructure : 100% Linux-based cloud infrastructure (AWS, Google Cloud, MongoDB Atlas) and services (ECS/EC2/Kubernetes, Elasticache, MemoryStore, RDS, CloudSQL, BigQuery etc.) Infrastructure as Code (IaC) : GitHub, Terragrunt, Terraform, Ansible CI/CD: Spinnaker, Jenkins Observability & Alerting : New Relic, AWS CloudWatch, Google Cloud Stackdriver, Squadcast Agile/Scrum practices utilizing JIRA Responsibilities SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives. Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience. Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department. Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline. Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence. Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it". Required Skills Experience & Engineering 10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications. Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible). Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases. SRE & Operations Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals". SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability. Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security. Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection). Leadership Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills. Documentation & Training: Skilled in documenting solutions and training operational teams on how to effectively support and maintain systems. Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion. Bonus Skills Automation Tools: Experience in developing solutions using server automation tools such as Terraform, Ansible. CI/CD Expertise: Experience in writing and maintaining CI/CD pipelines and services. Kubernetes: Experience in building, deploying, and optimizing Kubernetes-based infrastructure Perimeter Defense: Experience configuring and managing Web Application Firewalls (WAF) (e.g., Cloudflare, AWS WAF, Akamai) and DDOS protection mechanisms. Education Bachelor's degree in Computer Science or related technical field BENEFITS Medical - 100% of employee premiums for selected individual plans Dental - 100% of employee premiums covered Vision - 100% of employee premiums covered LinkedIn Learning 401(k) plus matching (US Based Only) Unlimited PTO Calm subscription Annual Company Retreat Intellum is an equal-opportunity employer. We're committed to building an inclusive team that celebrates diversity in people, perspectives, and backgrounds regardless of race, color, national origin, gender, sexual orientation, age, religion, disability, citizenship, veteran status, or any other protected status. We encourage you to apply for an open position and if you have questions about whether or not your job experience and skill set meet the requirements for a specific role, reach out to us directly at View email address on click.appcast.io. If you are an individual applying from CA, NY, CO, CT, MD, NV, or RI, please reach out to View email address on click.appcast.io to inquire about specific pay ranges.
- The Home Depot is hiring a Senior Software Engineer for Site Reliability in Atlanta, Georgia. This role focuses on enhancing product reliability and drives platform stability with automated solutions. Responsibilities include software development and deployment, collaborating...Suggested
- ...Position Purpose:The Senior Software Engineer for Site Reliability Engineering (Store Systems Enablement) builds and operates the internal platforms... ...in on-call rotation for observability infrastructure. Lead and contribute to blameless post-mortems. Design and execute...SuggestedWork experience placementLocal areaRemote workShift work
- ...Description Job Description Canonical is a leading provider of open source software and... ...as public cloud, data science, AI, engineering innovation, and IoT. Our customers... ...profitable, and growing. We are hiring a Site Reliability Engineer Our goal is to perfect...SuggestedWork at officeLocal areaRemote workWork from homeWorldwide
$104k - $130k
...infrastructure as well as help improve the reliability, quality of services and overall... ...recovery. You’ll collaborate or embed with engineering teams, helping them to improve the... ...more about our locations by visiting our site. Compensation & Benefits The base...SuggestedFull timeWork experience placement- ...About the RoleYou'll own the reliability posture of a large-scale healthcare platform. That... ...production-ready. You'll work alongside software engineers and security engineers who are building... ...in on-call rotation and lead incident response for platform issuesPartner...SuggestedPermanent employment
$180k - $220k
...intelligence platform used by some of the world's leading software organizations – Netflix,... ...a technical and operational leader for reliability across Develocity. You'll help define... ...-on role with broad influence across engineering, cloud platform, and customer-facing...Full timeRemote workWork from homeShift work- GoHealth Urgent Care is hiring a Site Reliability Engineer in Atlanta, Georgia. This role focuses on maintaining and enhancing the reliability, security, and performance of web and mobile applications. You will be responsible for managing Azure DevOps pipelines and collaborating...
$117k - $209.33k
## Site Reliability EngineerApplylocations: Atlanta, GA, USAtime type: Full timeposted on: Posted Todayjob requisition id: 26WD98046**Job Requisition... ...exciting new opportunity has opened for a Site Reliability Engineer within the Autodesk PDMS Platform SRE team. The successful...Permanent employment$164.3k - $222.3k
.... This position is based in our Reston, VA office and offers a hybrid work schedule. Verisign is hiring a Senior Site Reliability Engineer to help lead a team responsible for building, managing, maintaining, and scaling the Linux infrastructure on which our mission‑critical...Work at officeFlexible hours- ...valuable than ever - And that’s just how we’ll make you feel.The Site Reliability Engineer is responsible for maintaining and enhancing the... ...resolution, performing root cause analysis and minimizing downtime.Lead efforts to identify, remediate, and document security...Work experience placementWork at officeLocal area
- Who we’re looking for? A Site Reliability DevOps engineer working as part of the high-performing Operations team (SRE) growing their knowledge and skillset. Helps maintain existing business-critical applications and infrastructure while recommending technical and process...Remote jobMonday to Friday
- Job Title :- Site Reliability Engineer (SRE) Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 Location :- Atlanta, GA (Onsite) Job Description We are seeking a highly skilled Site Reliability Engineer (SRE)with expertise...
- Summary: As a Sr. Site Reliability Engineer, you are instrumental in helping make our client’s Kubernetes-centric ProArchive application resilient... ...the most popular communications platforms and the world’s leading cloud infrastructure platforms. They use the latest in AI/...
- ...improve cloud infrastructure reliability, scalability, and operational... ...code in Go, Python, or similar. Lead design reviews and set code... ...and tools that enable engineering teams to provision services rapidly... ...engineering, cloud infrastructure, or site reliability engineering....
- ...- AWS, Google Cloud, and Azure is a plus - CI/CD Automation, Database Management. The Technical Support Specialist in Site Reliability Engineering (SRE) will be responsible for ensuring the reliability and stability of the systems and applications. The role involves...
- ...cloud-native systems. As a Staff Platform Engineer, you will play a critical role in... ...technical leadership role. You will own reliability for major platform domains, design scalable... ...Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a...Full time
$135.8k - $183.8k
...Postgres DBs in support of key services that make the internet work. The ideal candidate will work with other DBA SREs, application engineers, Infrastructure teams, Security and Project Managers maintaining critical internet infrastructure. Responsibilities Maintain and...Work experience placementWork at officeFlexible hours- Overview Site Reliability Engineering (SRE) Architect Location: Atlanta, GA Duration: 12 Months+ Extension Hourly Rate: Depending on Experience... ...behaviour Reliability Strategy & Design: With overall maturity lead the definition and implementation strategy for Service...Hourly payPermanent employmentContract workLocal areaEarly shift
$300k - $360k
...without any hidden fees or compounding interest.As a Director of Site Reliability Engineering, you will own execution for reliability, availability, and... ...risks across the organization.You will hire, grow, and lead a diverse, global team of SREs, systems engineers, and full...Work at officeRemote workFlexible hours$130k - $150k
...You'll work alongside software engineers and security engineers who... ...and improve CI/CD pipelines - reliability, deployment safety, rollback... ...Participate in on‑call rotation and lead incident response for... ...figure it out together. Senior Site Reliability Engineer Salary:...Permanent employmentFlexible hours- A leading IT solutions provider in Atlanta is looking for a Site Reliability Engineer (SRE) with expertise in Adobe Experience Manager (AEM) and DevOps practices. The successful candidate will maintain and enhance the reliability of AEM applications while implementing scalable...
$10 - $13 per hour
...We are seeking a Lead Java Integrator to design and build APIdriven integration services that expose and orchestrate enterprise data... ...efforts across enterprise platforms and systems Ensure API reliability, performance, and data integrity Set and enforce coding standards...- ...Software Engineer Lead We are FIS. Our technology powers the world's economy and our teams bring innovation to life. We champion diversity... ...with product, quality, and operations teams to deliver reliable releases Identify and resolve performance, reliability, and...
$139.74k - $209.62k
...Platform Engineer Lead PLEASE NOTE: This position is not eligible for current or future visa sponsorship Location : This role requires associates to be in-office 1 - 2 days per week, fostering collaboration and connectivity, while providing flexibility to support...Temporary workWork experience placementWork at officeLocal area2 days per week1 day per week- Autodesk, Inc. is seeking a Site Reliability Engineer based in Atlanta, GA. This role involves architecting solutions for SaaS applications, managing cloud infrastructure, and ensuring reliability and performance. Candidates should have a background in DevOps, strong AWS...
$79.4k
...Application Developer, Lead Georgia State University's Instructional Innovation and Technology (IIT) division is seeking a highly skilled... ...-on experience will be considered. NOTE: This role requires on-site work. Remote or hybrid work options are not available for this...Full timeWork at officeRemote workShift work- Lead SCADA / HMI Systems Integrator - Ignition HMI/SCADA Lead SCADA / HMI Systems Integrator - Ignition HMI/SCADA A growing engineering and technology solutions firm is seeking an experienced SCADA / HMI Systems Integrator to join its expanding automation team supporting...Flexible hours
$70k - $90k
Technology Partner is looking for a Lead Security Integration Technician in Atlanta, GA. The ideal candidate will have over four years of installation or service experience with access control, CCTV/IP video, and intrusion systems. Responsibilities include leading field...For subcontractor$70k - $90k
Technology Partner, LLC is seeking a Lead Security Integration Technician in Atlanta, GA. The ideal candidate will have over 4 years... ..., benefits, and opportunities for career advancement in systems engineering or project management. #J-18808-Ljbffr Technology Partner, LLC- ...environments. We are currently conducting a confidential search for a Lead AI Governance & Cloud Architect to help architect and... ...~7+ years of experience in cloud architecture, infrastructure engineering, or platform engineering ~ Proven experience deploying and governing...Temporary work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Lead Site Reliability Engineer. Be the first to apply!
- lead network engineer Atlanta, GA
- lead web developer Atlanta, GA
- lead system engineer Atlanta, GA
- lead algorithm engineer Atlanta, GA
- lead industrial engineer Atlanta, GA
- lead operating engineer Atlanta, GA
- lead infrastructure engineer Atlanta, GA
- lead automation engineer Atlanta, GA
- lead engineer Atlanta, GA
- site reliability engineer Atlanta, GA



