Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer

$142.7k - $158.3k

General Dynamics Mission Systems


Basic Qualifications
Bachelor's degree in Software Engineering, or related Science, Technology, Engineering or Mathematics field, plus a minimum of 8 years of relevant experience; or Master's degree, plus 6 years relevant experience.
Responsibilities for this Position
What You'll Own
  • SLOs and reliability metrics. Define service level objectives for every AI service that goes to production. Establish error budgets and use them to drive engineering decisions - not just measure uptime.
  • Monitoring and observability. Build and maintain monitoring, logging, and alerting infrastructure for AI services. You will know when something is degrading before users do.
  • Incident response. Establish incident management procedures, lead post-incident reviews, and drive corrective actions. When something breaks, you coordinate the response and ensure it doesn't break the same way again.
  • Operational readiness reviews. Before any AI service goes live, you validate that it meets reliability, security, and operational standards. You are the gate between "it works in dev" and "it's ready for production."
  • Capacity planning and cost monitoring. Track resource consumption, forecast capacity needs, and monitor costs - tokens, compute, storage. You ensure the platform scales without surprises.
  • Toil elimination. Identify and automate repetitive operational tasks. If a human is doing something a script could do, you fix that.
What You Won't Own
  • Application development or AI model building - you ensure what they build is operable, you don't build it
  • Infrastructure provisioning - IT provides the infrastructure; you define what's needed and validate it works
  • Business process decisions or backlog prioritization
What Makes This Role Different
  • AI services have failure modes that traditional applications don't - model drift, token budget exhaustion, prompt injection, upstream data quality degradation. You will build monitoring for problems that most SRE teams have never encountered.
  • You are applying SRE principles from scratch. There is no existing SRE practice to inherit - you will define it for the platform.
  • Your operational readiness reviews directly determine whether AI services go live. You have real authority to say "not ready."
Required Qualifications
  • Bachelor's degree in Computer Science, Software Engineering, or a related field, plus 5 years of experience; or Master's degree plus 3 years of experience
  • Production SRE or DevOps experience - you have owned the reliability of systems that real users depended on, not just built CI/CD pipelines
  • Hands-on experience with monitoring and observability tools - Prometheus, Grafana, Datadog, ELK, CloudWatch, or similar. You have built dashboards and alerts that caught real problems.
  • Strong scripting and automation skills - Python, Bash, infrastructure-as-code (Terraform, CloudFormation, or similar)
  • Experience with containerized environments - Docker, Kubernetes, container orchestration at scale
  • Experience defining and managing SLOs, error budgets, and incident response procedures in production
  • S. citizenship required. Department of Defense Secret security clearance is required at time of hire.
Preferred Qualifications
  • Experience with AI/ML production systems - model serving, inference monitoring, token cost tracking, or similar
  • Multi-cloud experience (AWS, Azure, GCP) including cloud-native monitoring and logging services
  • Experience building operational readiness review processes or production launch checklists
  • Familiarity with Google SRE principles - you have read the book and applied the concepts, not just referenced them in interviews
  • Experience in environments where reliability has compliance or safety implications - defense, healthcare, finance, or critical infrastructure
What Sets You Apart
  • You think about failure before you think about features. Your first question about any new system is "how does this break?"
  • You automate yourself out of toil. If you're doing the same thing twice, you write a script.
  • You have said "not ready" to a team that wanted to ship, and you were right.
  • You build monitoring that tells you what's wrong, not just that something is wrong.
  • You write post-incident reviews that actually change how systems are built, not just how incidents are documented.
Details
  • Remote - 100% telework
  • 9/80 schedule
  • Defense industry experience is not required

Target salary range: USD $142,696.00/Yr. - USD $158,303.00/Yr. This estimate represents the typical salary range for this position based on experience and other factors (geographic location, etc.). Actual pay may vary. This job posting will remain open until the position is filled.

Company Overview

General Dynamics Mission Systems (GDMS) engineers a diverse portfolio of high technology solutions, products and services that enable customers to successfully execute missions across all domains of operation. With a global team of 12,000+ top professionals, we partner with the best in industry to expand the bounds of innovation in the defense and scientific arenas. Given the nature of our work and who we are, we value trust, honesty, alignment and transparency. We offer highly competitive benefits and pride ourselves in being a great place to work with a shared sense of purpose. You will also enjoy a flexible work environment where contributions are recognized and rewarded. If who we are and what we do resonates with you, we invite you to join our high-performance team!


Equal Opportunity Employer / Individuals with Disabilities / Protected Veterans

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer in Scottsdale, AZ vacancy
  • $115.28k - $196.13k

     ...Sr. Site Reliability Engineer- Hybrid We are Farmers – where ambition meets opportunity. At Farmers, we're not just known for unforgettable jingle – we're a team with a passion for purpose and making a real difference in people's lives. We deliver peace of mind when... 
    Suggested
    Work at office
    Flexible hours
    Shift work

    Farmers Inc

    Phoenix, AZ
    16 hours ago
  •  ...Job Title : Site Reliability Engineer Hybrid Onsite : Worker is required to work onsite 2-3 days per week in Phoenix, AZ OR Plano, TX MAIN RESPONSIBILITIES • Experience in leading Observability initiatives as Lead Engineer. • Development and implementation... 
    Suggested
    Work experience placement
    Local area
    2 days per week
    3 days per week

    Saxon Global

    Phoenix, AZ
    3 days ago
  •  ...Hello, Job Title: (Site Reliability/Observability Engineer (SREs).) Phoenix, AZ Job Description: Objectives of this role: Run the production environment by monitoring availability and taking a holistic view of system health. Build... 
    Suggested

    E-Solutions

    Phoenix, AZ
    3 days ago
  •  ...Overview: Site Reliability Engineer Experience: ~3-5 years in Service Reliability/Operations managing large-scale, high-performance hybrid applications (on-prem + cloud). ~2-4 years in programming (Go, Python, Java, Rust). ~2+ years with cloud transitions... 
    Suggested

    Purple Drive

    Scottsdale, AZ
    2 days ago
  • $60 per hour

     ...Trident Consulting is seeking a "Site-Reliability Engineer" for one of our client in Scottsdale, AZ. A global leader in business and technology services. Job Title: Site-Reliability Engineer Location: Scottsdale, AZ Job Type: Contract Pay Rate: $60 Required... 
    Suggested
    Contract work

    Trident Consulting

    Scottsdale, AZ
    1 day ago
  •  ...Richardson, Texas or Scottsdale, Arizona. Travel cost for in person interview will not be reimbursed. Job title: Site-Reliability Engineer Role is onsite (5 days/wk) based out of Scottsdale, AZ Required Skills • Service reliability/operation experience... 
    Local area

    Diverse Lynx

    Scottsdale, AZ
    3 days ago
  • $106k - $130k

     ...sponsorship. Overall Purpose To create and maintain the next generation of application infrastructure and to be responsible for reliability, automation and scalability using and the latest best practices. Essential Functions Implement software and tools to... 
    Hourly pay
    Work experience placement
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services, LLC

    Scottsdale, AZ
    16 hours ago
  •  ...Title: Site Reliability Engineer Location: Phoenix, AZ Job Type: Full Time Minimum Qualifications •BS or MS degree in computer science, computer engineering, or other technical discipline, or equivalent 3-6 years of work experience in DevOps... 
    Full time
    Work experience placement

    TWO95 International

    Phoenix, AZ
    2 days ago
  •  ...Title: Site reliability engineer *Local to AZ Description: Identifies and establishes ways of stabilizing environments and sites while assessing opportunities to drive engineering stability through the analytics and metrics. Responsible for site design consulting... 
    Local area

    3B Staffing LLC

    Phoenix, AZ
    3 days ago
  • Job Title Good understanding of Production Support, Tools & Automation with 5+ years of Experience Requires knowledge using AppDynamics and APM Solutions to monitor application performance & infrastructure and aide in troubleshooting Experience on GCP, Microservices...

    Samprasoft

    Scottsdale, AZ
    16 hours ago
  •  ...Hybrid Tentative Start date - ASAP Contract duration - 6 months Vendor rate - 60 Mandatory skills - # SRE (Site Reliability Engineering)SFT # Hands-on experience in design, building, testing, debug, deploy, manage APIs and integrations Experience in Any... 
    Contract work
    Immediate start
    Remote work

    eTeam

    Phoenix, AZ
    3 days ago
  • $58.8k - $156.7k

     ...Site Reliability Engineer - Local to Phoenix, AZ Category: Software Development/ Engineering Main location: United States, Arizona, Phoenix Position ID: J0526-0838 Employment Type: Full Time Position Description: CGI is looking to hire a Site Reliability... 
    Permanent employment
    Full time
    Local area

    CGI Technologies and Solutions, Inc.

    Phoenix, AZ
    2 days ago
  • $104.9k - $174.7k

     ...Customer Data Management. You can learn more about LexisNexis Risk at the link below, the Role:We are hiring a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and improve the reliability of our production systems. This is not a purely advisory... 
    Work at office
    Local area
    Remote work

    LexisNexis Risk Solutions

    Phoenix, AZ
    3 days ago
  •  ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas...  ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle... 

    Forhyre

    Phoenix, AZ
    14 days ago
  • $194k - $237k

     ...the date of hire. This position is ineligible for employment Visa sponsorship. Overall Purpose The Principal Site Reliability Engineer partners with development teams by designing availability and resiliency patterns in applications and infrastructure.... 
    Hourly pay
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services, LLC

    Scottsdale, AZ
    3 days ago
  •  ...Director, Site Reliability Engineering Phoenix, Arizona SmartRent (NYSE: SMRT) is revolutionizing how people live and work with the industry's only end-to-end platform designed for the rental housing industry. By uniting purpose-built software, integrated hardware... 
    Flexible hours

    SmartRent

    Phoenix, AZ
    3 days ago
  • $79.2k - $178.1k

     ...Job Description As a Senior AI Site Reliability Engineer, you will play a pivotal role in building and operating the next-generation, AI-first Electronic Health Record platform. In this role, you will design, build, and operate highly reliable, scalable infrastructure... 
    Temporary work
    Flexible hours

    Oracle

    Phoenix, AZ
    1 day ago
  • $186.07k - $218.9k

     ...*AI-Driven Innovation: *Join a high-performing team of skilled engineers driving AI transformation at Coinbase. This role involves leading...  ...quick access to screen reading technology compatible with this site click here to download a free compatible screen reader (free... 
    Local area

    Coinbase

    Phoenix, AZ
    16 hours ago
  • Job Title- Site-Reliability Engineer with GCP Location: Scottsdale, AZ (Onsite) Type: : Long Term Contract Interview process: - 1 level of internal evaluation with Implementation partner - 3 Levels of Client Interviews (2 Telephonic and 1 In person). Last round in person... 
    Long term contract
    Contract work

    Tech Mirrors

    Scottsdale, AZ
    16 hours ago
  •  ...us to ensure we take care of ourselves, each other, and our communities. Job Summary: Job Description: PayPal, Inc. seeks Site Reliability Engineer in Scottsdale, AZ Job Duties: Monitor and analyze system metrics to ensure optimal availability, performance, and reliability... 
    Full time
    Work at office
    Local area
    Immediate start
    Remote work
    Flexible hours

    PayPal

    Scottsdale, AZ
    9 hours ago
  •  ...Job Title : Senior Site Reliability Engineer (Python + Cloud Infra) Location : Phoenix, AZ (ONSITE) FULLTIME ONLY Job Description Must Have Technical/Functional Skills Looking for an experienced SRE with strong Python engineering... 
    Full time

    AceStack LLC

    Phoenix, AZ
    16 hours ago
  • $79.1k - $158.2k

     ...unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence. Oracle Health Data,...  ...Oracle Health’s Data & Analytics Platform. As a Senior Site Reliability Engineer (SRE), you will own shared, mission-critical... 
    Temporary work
    Immediate start
    Flexible hours

    Oracle

    Phoenix, AZ
    3 days ago
  •  ...Role Overview We are looking for a highly skilled SRE Engineer with strong hands-on experience in monitoring, production support...  ...industry . Key Responsibilities Production Support & Reliability Monitor and maintain the health, performance, and availability... 
    Local area

    Purple Drive

    Phoenix, AZ
    3 days ago
  •  ...Python3 to guide and support development teams. - Implement and maintain monitoring solutions using Grafana to ensure system reliability. - Provide regular updates to stakeholders on project status, risks, and issues. - Develop and maintain comprehensive project... 

    Omni Inclusive

    Phoenix, AZ
    16 hours ago
  •  ...skills and relentless drive for root cause and execute measures to reduce repeat occurrence. Good communication (Verbal/written) and Interpersonal Skills Required Skills: Reliability Additional Skills: Reliability Engineer This is a high PRIORITY requisition.... 

    Samprasoft

    Scottsdale, AZ
    16 hours ago
  • $100k - $125k

     ...Role - SRE with Data Engineer Experience Required - 8+ Years Must Have Technical/Functional Skills • In-depth knowledge...  ...distributed computing frameworks to ensure high performance and reliability. • Collaborate with cross-functional teams, including data... 

    Tata Consultancy Services

    Phoenix, AZ
    3 days ago
  •  ...Release Engineer We are looking for an experienced and passionate Release Engineer to join our team. As a Release Engineer, you will be responsible for ensuring products can effortlessly be delivered to users and customers using different distribution mechanisms and... 

    Samprasoft

    Scottsdale, AZ
    16 hours ago
  • $94.9k - $135.6k

     ...development, testing, operations, and platform teams to deliver value safely and efficiently. Cardinal Health is seeking a Release Engineer to lead iteration and release management activities supporting mission critical warehouse transformation initiatives on Program... 
    Temporary work
    Local area
    Immediate start
    Flexible hours

    Cardinal Health

    Phoenix, AZ
    1 day ago
  •  ...Job Description: We are seeking an experienced Release Train Engineer (RTE) to lead a large-scale Agile Release Train (ART) within a complex digital product ecosystem. The RTE will drive planning, execution, and continuous improvement across multiple cross-functional... 
    Remote work

    SysMind Tech

    Phoenix, AZ
    16 hours ago
  •  ...divh2Configuration/Build Release Engineer/h2pIDEALFORCE has a contract position available immediately for a Configuration / Build Release Engineer to join our customer in Phoenix, AZ. Client is considering only local candidates for this role./ph3Job Description/h3p Responsible... 
    Contract work
    Local area
    Immediate start

    Idealforce

    Phoenix, AZ
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer. Be the first to apply!