Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Manager, Site Reliability Engineering

Paradigm

Paradigm is a software company transforming the way that the residential, construction & building product industries operate across the globe. We are looking for a Manager, Site Reliability Engineering to be part of revolutionizing these industries. We're looking for a hands‑on SRE leader to build and develop a high‑performing team that oversees reliability across our Azure‑based platform. You'll promote modern SRE practices, drive down incident response times, and shape a culture where automation replaces toil and every incident becomes a learning opportunity. This role combines technical depth with people leadership. You'll design reliability frameworks, lead incident response, coach engineers, and partner with product teams to embed reliability into everything we build. Working closely with the Senior Director of SRE & Cloud Operations, you'll transform reactive operations into proactive, data‑driven service management with increasing use of AI and automation to get there faster. What You Will Do: Lead and grow a team of site reliability engineers. Provide guidance, mentorship, and career development. Contribute to and mature SRE practices across production services: SLOs, SLIs, error budgets, toil reduction, and blameless post‑mortems that turn incidents into lasting improvements. Oversee the incident management lifecycle end‑to‑end including detection, response, resolution, post‑incident review, and systemic improvement. Design on‑call rotations, runbooks, and escalation procedures that balance service reliability with engineer well‑being and sustainable work practices. Drive measurable reductions in MTTR and MTTD through improved observability, intelligent automation, and predictive monitoring. Build automation to eliminate manual operational work including provisioning, deployment, scaling, self‑healing, and reporting. Implement chaos engineering practices to validate system resilience and surface weaknesses before they cause outages. Partner with engineering and product teams to embed reliability requirements into the development lifecycle, from design through deployment. Collaborate with the observability team to ensure comprehensive instrumentation, smart alerting, and actionable dashboards across all critical services. Measure, report, and advocate for reliability improvements with both technical and executive stakeholders using data to drive investment decisions. What You Need to Succeed: Bachelor’s degree in Engineering, or a related field or equivalent experience. 7+ years in site reliability engineering, DevOps, or infrastructure engineering, with at least 1 year in people management (or demonstrated tech lead experience with direct influence over team processes and career growth). Hands‑on experience running production systems on Azure (including proficiency with key services such as AKS, App Services, Service Bus, Event Grid, and Azure Monitor) or comparable cloud platforms. Proven track record implementing SRE practices with measurable reliability improvements and familiarity with modern observability platforms (Datadog, Prometheus/Grafana, or equivalent). AI‑enhanced observability experience is preferred. Experience leading incident response for high‑severity production issues and running effective post‑mortems. Strong background in automation, infrastructure as code (Terraform, Bicep, or similar), and systematically eliminating manual operational work. Experience with Kubernetes container orchestration with production‑grade operational experience. Ability to automate workflows and build scripts using Python, Bash, PowerShell, or Go. Experience with AI coding assistants and CI/CD systems (GitHub Actions, Azure DevOps, ArgoCD) with automation capabilities is preferred. Knowledge of distributed systems patterns is preferred. Exposure to AIOps platforms or using LLMs for operational automation is preferred. Strong communication with the ability to make complex technical issues clear for both engineers and executives. Data‑driven approach. You use metrics and telemetry to guide decisions, not gut feel. You are collaborative cross‑functionally and build trust and alignment naturally. #J-18808-Ljbffr

Vacancy posted 7 hours ago
Similar jobs that could be interesting for youBased on the Manager, Site Reliability Engineering in Irving, TX vacancy
  •  ...Senior Site Reliability Engineer (SRE) — Combination of deep operational expertise and hands-on engineering ability. The majority of your time...  ...reliability engineering, cloud operations, automation, and incident management best practices. Required Qualifications 7 years of... 
    Suggested

    Veloc Inc

    Irving, TX
    2 days ago
  •  ...Site Reliability Engineer (Chicago, IL; Dallas, TX; ...) Qualifications: 8+ years of Software Engineering experience, or equivalent demonstrated...  ..., and ability to work effectively with the client, IT management and staff, and other groups in Information Technology, including... 
    Suggested
    Contract work
    For contractors
    Work experience placement

    Cedent

    Dallas, TX
    6 hours ago
  •  ...Platform (IDP) as a product, treating engineering teams as customers and optimizing for reliability, usability, and delivery...  ...workloads. Define, measure, and manage SLIs, SLOs, and error budgets for...  ...experience in Platform Engineering, Site Reliability Engineering, DevOps... 
    Suggested
    Temporary work

    Analytic Partners

    Dallas, TX
    7 hours ago
  •  ...Job Description Forhyre is looking for engineers who can bring unique perspectives and...  ...practices while building a culture of reliability and observability Engage in and improve...  ...Participate in critical incident management and timely post-mortems of production incidents... 
    Suggested

    Forhyre

    Dallas, TX
    18 days ago
  • $103.5k - $172.5k

     ...Overview SeniorManager, Site Reliability Engineering The Site Reliability Engineering Manager is responsible for overseeing the daily operations and delivery of the Site Reliability Engineering teams. This role plays a key part in driving team productivity and ensuring... 
    Suggested
    Contract work
    Temporary work
    Shift work

    JCPenney

    Dallas, TX
    1 day ago
  • Required Skills AWS/Azure/GCP (GCP is not used very much) Kubernetes Helm Docker Gitlab Grafana Cyberark/Hashicorp Vault Terraform etc. Experience Experience utilizing Java, Perl, Python, Go and scripting experience in Shell and Perl to automate reports and monitor enterprise...

    TechDigital Group

    Dallas, TX
    3 days ago
  • Position Overview: The primary responsibility of the Senior Site Reliability Engineer (SRE) is to lead reliability engineering initiatives...  ..., App Services, Functions, VMSS, Storage, Front Door, API Management, Load Balancers, Monitor, Log Analytics, App Insights, Key... 
    Shift work
    Night shift

    Las Vegas Sands Corp.

    Dallas, TX
    4 days ago
  • Role: Senior SRE Engineer Location: Washington DC - Hybrid Job Description...  ...and Grail to drive proactive reliability, mentoring cross-functional...  ..., or AWS CDK. Log Management: Manage high-volume log ingest...  ...Flexibility: Ability to work on-site in the Washington, DC area as... 
    Work from home
    Flexible hours

    Vytwo

    Dallas, TX
    1 day ago
  •  ...Description About the Role We're seeking an exceptional Principal Site Reliability Engineer to architect, design, and build our SRE foundation from the...  ...and self-healing systems using GKE, Cloud Functions, and managed services Optimize cloud costs while maintaining high... 
    Remote work

    INFINITE CHOICE LLC

    Dallas, TX
    a month ago
  • A leading technology solutions provider is seeking an experienced software developer to work on cloud migration and automation tools. The role primarily involves utilizing AWS, Azure, or GCP, with strong skills in Kubernetes, Docker, and microservices. Candidates should...

    TechDigital Group

    Dallas, TX
    6 hours ago
  •  ...Associate Director Of Identity And Access Management (Iam) Are you ready to make an...  ...Technology group delivers secure, reliable technology solutions that enable...  ...the technical leader responsible for Site Reliability Engineering across IAM platform, overseeing and... 
    Remote work
    Flexible hours

    Dtcc

    Dallas, TX
    3 days ago
  •  ...building the infrastructure, tooling, and engineering culture to scale both our platform and...  ...support, software engineering, and site reliability. This is not a traditional support...  ...DevOps teams • Familiarity with incident management processes (postmortems, SLAs, etc.) •... 
    Work at office
    Immediate start
    3 days per week

    Wellfit Technologies

    Irving, TX
    a month ago
  •  ...candidate for this role to work on site in the specified location(s). Workplace Services Engineering (WSE) is an organization...  ...As a Principal Architect, Site Reliability Engineering for Schwab's Technology...  ...observability, incident management, resilience engineering, and capacity... 
    Work at office

    Charles Schwab Corporation

    Southlake, TX
    1 day ago
  • $122.1k - $198.3k

    Associate Principal, Site Reliability Engineering Responsibilities Collaborate with development, operations and infrastructure teams to ensure availability...  ...the entire team succeeds Technical Skills Experience managing infrastructure in public cloud environments like AWS (... 
    Work experience placement
    Remote work
    2 days per week

    The Options Clearing Corporation

    Dallas, TX
    4 days ago
  • $170k - $300k

     ...strong focus on AI-driven test environment management . As Director, you will own the...  ...Operational Excellence Lead global engineering teams managing environment provisioning...  ...observability, incident management, and reliability practices Reduce risks related to configuration... 
    Full time

    Citigroup Inc

    Irving, TX
    7 days ago
  • $128.47k - $192.71k

     ...emergent" behaviors What You Will Have: Business Performance Management : Knowledge of technologies, techniques and practices for...  ...initiatives. • Assesses potential implications of re-engineering for multiple functions or departments. • Demonstrates mastery... 
    Part time
    Worldwide
    Flexible hours

    Caterpillar

    Irving, TX
    5 days ago
  •  ...Title: SAP ECommerce Project Manager Location: SAP ECommerce Project Manager JD: ~ Seeking a SAP Technical Project Manager to lead a high-priority e-commerce integration project on the SAP S4HANA RISE PCE platform. ~ This role involves coordinating... 

    United IT Solutions

    Irving, TX
    2 days ago
  •  ...Architect to lead the design of innovative solutions within Salesforce Revenue Cloud. You will collaborate with various departments, manage complex integrations, and drive technical guidance. The ideal applicant will have substantial experience with Salesforce CPQ,... 

    Jade Global

    Irving, TX
    7 hours ago
  •  ...Software Systems Engineer - IV /Java Developer America Networks is a leading sensor and networking solutions partner for companies in any Industrial, Manufacturing, and Waste management space. We design and manufacture sensors for storage tanks, water metering, energy... 

    America Networks

    Irving, TX
    3 days ago
  •  ...product data architecture & mapping Architect, design, and develop advanced customizations utilizing Salesforce Marketing cloud. Manage technical project team members (onshore and offshore). Running workshops, working closely with the client, able to talk about... 

    Kasmo Global

    Irving, TX
    3 days ago
  • $130k - $150k

     ...transactions . As we continue to scale, reliability, observability, alerting, and production...  ...seeking a hands‑on Platform Reliability Engineer, Azure to help strengthen the reliability...  ...with DevOps, cloud operations, site reliability, platform engineering, or production... 
    Work at office
    3 days per week

    Wellfit Technologies

    Irving, TX
    7 hours ago
  •  ...Wellfit Technologies, located in Irving, Texas, is looking for a Platform Reliability Engineer to enhance reliability and operational maturity of Azure environments. This hybrid role requires strong Azure experience and a collaborative approach with engineering teams,... 

    Wellfit Technologies

    Irving, TX
    6 hours ago
  •  ...transfer for this role. At Wells Fargo, we’re investing in senior engineering leadership to help shape and advance enterprise-scale...  ...Engineer will assist in the modernization of our Identity & Access Management (IAM) data platforms and applications. You’ll define the... 
    Full time
    Work experience placement
    Work at office
    Visa sponsorship
    3 days per week

    Wells Fargo

    Irving, TX
    3 days ago
  • $144.48k - $216.72k

     ...Primarily affects a sub-function. Responsible for handling staff management issues, including resource management and allocation of work...  ...cross-functional teams, including data scientists, software engineers, and product managers, to prototype, test, and refine AI-powered... 
    Full time

    Citibank (Switzerland) AG

    Irving, TX
    7 hours ago
  • $125.76k - $188.64k

    A major financial institution in Irving, Texas is seeking an Applications Development Technology Lead Analyst to lead application systems analysis and implementation. The role requires developing robust applications for Capital Markets products in a collaborative environment...

    Citibank (Switzerland) AG

    Irving, TX
    1 day ago
  • $110k - $230k

     ...and Great Careers. GEICO's Cyber Security Engineering & Analytics, Automation (SEA) team is seeking a Staff Cyber Site Reliability Engineer (SRE) — a hands-on, engineering-minded...  ..., detection pipelines, or vulnerability management tooling; DevSecOps experience is a strong... 
    Hourly pay
    Full time
    Work experience placement
    Local area
    Flexible hours

    GEICO

    Dallas, TX
    1 day ago
  • A technology solutions provider is seeking a Nearshore Developer to create efficient solutions for Supply Chain, Order Management, and data integration challenges. The ideal candidate will have a Bachelor's degree, at least 3 years of coding experience, and strong skills... 

    ETHEREUM TECHNOLOGIES LLC

    Irving, TX
    1 day ago
  • $77.4k - $135.4k

     ...platforms. Key responsibilities include investigating production issues and supporting operational processes to enhance platform reliability. The ideal candidate will have a relevant degree, 2+ years of experience in application support or DevOps, and strong troubleshooting... 

    Vizient

    Irving, TX
    2 days ago
  • $91.09 - $96.09 per hour

     ...technology strategy for the Cross Platform Engineering and Reference Architecture team...  ...automated compliance validation, vulnerability management, and remediation processes Lead...  ...Job Expectations Ability to work on-site at approved location Ability to collaborate... 
    Hourly pay
    Contract work
    Temporary work
    Work experience placement
    Work at office
    Shift work

    Randstad

    Irving, TX
    1 day ago
  •  ...Reliability Engineer HROC page is loaded## Reliability Engineer HROClocations: Irving, TXtime type...  ...and implement effective spare parts management strategies across the plants supported...  ...with the ability to support multiple sites remotely.* Willingness to travel to the... 
    Temporary work
    Remote work
    Flexible hours

    Heidelbergmaterials

    Irving, TX
    7 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Manager, Site Reliability Engineering. Be the first to apply!