Site Reliability Engineer
$142.7k - $158.3kGeneral Dynamics Mission Systems
Basic Qualifications
Bachelor's degree in Software Engineering, or related Science, Technology, Engineering or Mathematics field, plus a minimum of 8 years of relevant experience; or Master's degree, plus 6 years relevant experience.
Responsibilities for this Position
What You'll Own
- SLOs and reliability metrics. Define service level objectives for every AI service that goes to production. Establish error budgets and use them to drive engineering decisions - not just measure uptime.
- Monitoring and observability. Build and maintain monitoring, logging, and alerting infrastructure for AI services. You will know when something is degrading before users do.
- Incident response. Establish incident management procedures, lead post-incident reviews, and drive corrective actions. When something breaks, you coordinate the response and ensure it doesn't break the same way again.
- Operational readiness reviews. Before any AI service goes live, you validate that it meets reliability, security, and operational standards. You are the gate between "it works in dev" and "it's ready for production."
- Capacity planning and cost monitoring. Track resource consumption, forecast capacity needs, and monitor costs - tokens, compute, storage. You ensure the platform scales without surprises.
- Toil elimination. Identify and automate repetitive operational tasks. If a human is doing something a script could do, you fix that.
- Application development or AI model building - you ensure what they build is operable, you don't build it
- Infrastructure provisioning - IT provides the infrastructure; you define what's needed and validate it works
- Business process decisions or backlog prioritization
- AI services have failure modes that traditional applications don't - model drift, token budget exhaustion, prompt injection, upstream data quality degradation. You will build monitoring for problems that most SRE teams have never encountered.
- You are applying SRE principles from scratch. There is no existing SRE practice to inherit - you will define it for the platform.
- Your operational readiness reviews directly determine whether AI services go live. You have real authority to say "not ready."
- Bachelor's degree in Computer Science, Software Engineering, or a related field, plus 5 years of experience; or Master's degree plus 3 years of experience
- Production SRE or DevOps experience - you have owned the reliability of systems that real users depended on, not just built CI/CD pipelines
- Hands-on experience with monitoring and observability tools - Prometheus, Grafana, Datadog, ELK, CloudWatch, or similar. You have built dashboards and alerts that caught real problems.
- Strong scripting and automation skills - Python, Bash, infrastructure-as-code (Terraform, CloudFormation, or similar)
- Experience with containerized environments - Docker, Kubernetes, container orchestration at scale
- Experience defining and managing SLOs, error budgets, and incident response procedures in production
- S. citizenship required. Department of Defense Secret security clearance is required at time of hire.
- Experience with AI/ML production systems - model serving, inference monitoring, token cost tracking, or similar
- Multi-cloud experience (AWS, Azure, GCP) including cloud-native monitoring and logging services
- Experience building operational readiness review processes or production launch checklists
- Familiarity with Google SRE principles - you have read the book and applied the concepts, not just referenced them in interviews
- Experience in environments where reliability has compliance or safety implications - defense, healthcare, finance, or critical infrastructure
- You think about failure before you think about features. Your first question about any new system is "how does this break?"
- You automate yourself out of toil. If you're doing the same thing twice, you write a script.
- You have said "not ready" to a team that wanted to ship, and you were right.
- You build monitoring that tells you what's wrong, not just that something is wrong.
- You write post-incident reviews that actually change how systems are built, not just how incidents are documented.
- Remote - 100% telework
- 9/80 schedule
- Defense industry experience is not required
Target salary range: USD $142,696.00/Yr. - USD $158,303.00/Yr. This estimate represents the typical salary range for this position based on experience and other factors (geographic location, etc.). Actual pay may vary. This job posting will remain open until the position is filled.
Company OverviewGeneral Dynamics Mission Systems (GDMS) engineers a diverse portfolio of high technology solutions, products and services that enable customers to successfully execute missions across all domains of operation. With a global team of 12,000+ top professionals, we partner with the best in industry to expand the bounds of innovation in the defense and scientific arenas. Given the nature of our work and who we are, we value trust, honesty, alignment and transparency. We offer highly competitive benefits and pride ourselves in being a great place to work with a shared sense of purpose. You will also enjoy a flexible work environment where contributions are recognized and rewarded. If who we are and what we do resonates with you, we invite you to join our high-performance team!
Equal Opportunity Employer / Individuals with Disabilities / Protected Veterans
$115.28k - $196.13k
...Sr. Site Reliability Engineer- Hybrid We are Farmers – where ambition meets opportunity. At Farmers, we're not just known for unforgettable jingle – we're a team with a passion for purpose and making a real difference in people's lives. We deliver peace of mind when...SuggestedWork at officeFlexible hoursShift work- ...Job Title : Site Reliability Engineer Hybrid Onsite : Worker is required to work onsite 2-3 days per week in Phoenix, AZ OR Plano, TX MAIN RESPONSIBILITIES • Experience in leading Observability initiatives as Lead Engineer. • Development and implementation...SuggestedWork experience placementLocal area2 days per week3 days per week
- ...Hello, Job Title: (Site Reliability/Observability Engineer (SREs).) Phoenix, AZ Job Description: Objectives of this role: Run the production environment by monitoring availability and taking a holistic view of system health. Build...Suggested
- ...Overview: Site Reliability Engineer Experience: ~3-5 years in Service Reliability/Operations managing large-scale, high-performance hybrid applications (on-prem + cloud). ~2-4 years in programming (Go, Python, Java, Rust). ~2+ years with cloud transitions...Suggested
$60 per hour
...Trident Consulting is seeking a "Site-Reliability Engineer" for one of our client in Scottsdale, AZ. A global leader in business and technology services. Job Title: Site-Reliability Engineer Location: Scottsdale, AZ Job Type: Contract Pay Rate: $60 Required...SuggestedContract work- ...Richardson, Texas or Scottsdale, Arizona. Travel cost for in person interview will not be reimbursed. Job title: Site-Reliability Engineer Role is onsite (5 days/wk) based out of Scottsdale, AZ Required Skills • Service reliability/operation experience...Local area
$106k - $130k
...sponsorship. Overall Purpose To create and maintain the next generation of application infrastructure and to be responsible for reliability, automation and scalability using and the latest best practices. Essential Functions Implement software and tools to...Hourly payWork experience placementWork at officeImmediate startVisa sponsorshipWork visaFlexible hours- ...Title: Site Reliability Engineer Location: Phoenix, AZ Job Type: Full Time Minimum Qualifications •BS or MS degree in computer science, computer engineering, or other technical discipline, or equivalent 3-6 years of work experience in DevOps...Full timeWork experience placement
- ...Title: Site reliability engineer *Local to AZ Description: Identifies and establishes ways of stabilizing environments and sites while assessing opportunities to drive engineering stability through the analytics and metrics. Responsible for site design consulting...Local area
- Job Title Good understanding of Production Support, Tools & Automation with 5+ years of Experience Requires knowledge using AppDynamics and APM Solutions to monitor application performance & infrastructure and aide in troubleshooting Experience on GCP, Microservices...
- ...Hybrid Tentative Start date - ASAP Contract duration - 6 months Vendor rate - 60 Mandatory skills - # SRE (Site Reliability Engineering)SFT # Hands-on experience in design, building, testing, debug, deploy, manage APIs and integrations Experience in Any...Contract workImmediate startRemote work
$58.8k - $156.7k
...Site Reliability Engineer - Local to Phoenix, AZ Category: Software Development/ Engineering Main location: United States, Arizona, Phoenix Position ID: J0526-0838 Employment Type: Full Time Position Description: CGI is looking to hire a Site Reliability...Permanent employmentFull timeLocal area$104.9k - $174.7k
...Customer Data Management. You can learn more about LexisNexis Risk at the link below, the Role:We are hiring a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and improve the reliability of our production systems. This is not a purely advisory...Work at officeLocal areaRemote work- ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas... ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle...
$194k - $237k
...the date of hire. This position is ineligible for employment Visa sponsorship. Overall Purpose The Principal Site Reliability Engineer partners with development teams by designing availability and resiliency patterns in applications and infrastructure....Hourly payWork at officeImmediate startVisa sponsorshipWork visaFlexible hours- ...Director, Site Reliability Engineering Phoenix, Arizona SmartRent (NYSE: SMRT) is revolutionizing how people live and work with the industry's only end-to-end platform designed for the rental housing industry. By uniting purpose-built software, integrated hardware...Flexible hours
$79.2k - $178.1k
...Job Description As a Senior AI Site Reliability Engineer, you will play a pivotal role in building and operating the next-generation, AI-first Electronic Health Record platform. In this role, you will design, build, and operate highly reliable, scalable infrastructure...Temporary workFlexible hours$186.07k - $218.9k
...*AI-Driven Innovation: *Join a high-performing team of skilled engineers driving AI transformation at Coinbase. This role involves leading... ...quick access to screen reading technology compatible with this site click here to download a free compatible screen reader (free...Local area- Job Title- Site-Reliability Engineer with GCP Location: Scottsdale, AZ (Onsite) Type: : Long Term Contract Interview process: - 1 level of internal evaluation with Implementation partner - 3 Levels of Client Interviews (2 Telephonic and 1 In person). Last round in person...Long term contractContract work
- ...us to ensure we take care of ourselves, each other, and our communities. Job Summary: Job Description: PayPal, Inc. seeks Site Reliability Engineer in Scottsdale, AZ Job Duties: Monitor and analyze system metrics to ensure optimal availability, performance, and reliability...Full timeWork at officeLocal areaImmediate startRemote workFlexible hours
- ...Job Title : Senior Site Reliability Engineer (Python + Cloud Infra) Location : Phoenix, AZ (ONSITE) FULLTIME ONLY Job Description Must Have Technical/Functional Skills Looking for an experienced SRE with strong Python engineering...Full time
$79.1k - $158.2k
...unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence. Oracle Health Data,... ...Oracle Health’s Data & Analytics Platform. As a Senior Site Reliability Engineer (SRE), you will own shared, mission-critical...Temporary workImmediate startFlexible hours- ...Role Overview We are looking for a highly skilled SRE Engineer with strong hands-on experience in monitoring, production support... ...industry . Key Responsibilities Production Support & Reliability Monitor and maintain the health, performance, and availability...Local area
- ...Python3 to guide and support development teams. - Implement and maintain monitoring solutions using Grafana to ensure system reliability. - Provide regular updates to stakeholders on project status, risks, and issues. - Develop and maintain comprehensive project...
- ...skills and relentless drive for root cause and execute measures to reduce repeat occurrence. Good communication (Verbal/written) and Interpersonal Skills Required Skills: Reliability Additional Skills: Reliability Engineer This is a high PRIORITY requisition....
$100k - $125k
...Role - SRE with Data Engineer Experience Required - 8+ Years Must Have Technical/Functional Skills • In-depth knowledge... ...distributed computing frameworks to ensure high performance and reliability. • Collaborate with cross-functional teams, including data...- ...Release Engineer We are looking for an experienced and passionate Release Engineer to join our team. As a Release Engineer, you will be responsible for ensuring products can effortlessly be delivered to users and customers using different distribution mechanisms and...
$94.9k - $135.6k
...development, testing, operations, and platform teams to deliver value safely and efficiently. Cardinal Health is seeking a Release Engineer to lead iteration and release management activities supporting mission critical warehouse transformation initiatives on Program...Temporary workLocal areaImmediate startFlexible hours- ...Job Description: We are seeking an experienced Release Train Engineer (RTE) to lead a large-scale Agile Release Train (ART) within a complex digital product ecosystem. The RTE will drive planning, execution, and continuous improvement across multiple cross-functional...Remote work
- ...divh2Configuration/Build Release Engineer/h2pIDEALFORCE has a contract position available immediately for a Configuration / Build Release Engineer to join our customer in Phoenix, AZ. Client is considering only local candidates for this role./ph3Job Description/h3p Responsible...Contract workLocal areaImmediate start
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!
- on site coordinator Scottsdale, AZ
- site recruiter Scottsdale, AZ
- site safety Scottsdale, AZ
- site services specialist Scottsdale, AZ
- on-site clinical research associate (traveling/remote) Scottsdale, AZ
- IT site lead Scottsdale, AZ
- website coordinator Scottsdale, AZ
- site leader Scottsdale, AZ
- junior website developer Scottsdale, AZ
- site reliability engineer


