Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Lead Site Reliability Engineer

Bridge Defense

Lead Site Reliability Engineer

Bridge Defense is redefining how modern defense technology is delivered. Based in Washington, D.C., we are built for the dynamic mission environment facing the Department of Defense, the Intelligence Community, and federal law enforcement agencies. We provide full-spectrum national security solutions that combine secure infrastructure, cleared talent, and mission-ready software to meet evolving defense challenges. Our services include secure software development in classified environments and the design and implementation of advanced IT and cybersecurity capabilities ranging from secure cloud architectures and enterprise infrastructure to data center operations, scientific analysis, and cutting-edge cyber defense.

We are led by technologists and veterans with firsthand mission experience, which enables us to understand both the operational realities and the innovation needed to succeed. Our approach is agile and outcome-based, delivering results in weeks rather than months whenever possible.

At Bridge Defense we value people, integrity, and excellence. We foster an environment where innovation thrives in support of traditional mission requirements. Our team members receive competitive compensation, robust benefits, professional development and certification opportunities, and clear paths for growth while working on the nation's most critical projects.

Core Values
  • Innovation & Responsiveness: We push beyond legacy models with efficient, tech-led solutions built to scale and evolve.
  • Trusted Performance: Security, compliance, and deep experience in delivering to demanding environments guides all we do.
  • Mission Focused Expertise: From veteran leadership to cleared engineers, our people understand both the technology and the mission.
About the Role

As the Lead Site Reliability Engineer for our ComputeBridge Engagement, you'll be responsible for the reliability, scalability, and performance of one of the largest hardware and AI infrastructure efforts in the U.S. defense sector. You will lead the deployment, management, and automation of a high-performance computing mesh across multiple secure environments, ensuring operational excellence and mission continuity for a 9-figure government program.

This is a hands-on engineering leadership role that bridges physical infrastructure and modern DevOps automation, ideal for someone who thrives at the intersection of hardware systems, distributed computing, and AI/ML workflows.

What You'll Do
  • Lead infrastructure design, deployment, and operations for ComputeBridge hardware clusters across secure and distributed environments
  • Install and configure physical systems, including high-density GPU servers, networking gear, and storage arrays
  • Build and deploy secure Linux images and containerized workloads using OpenShift and other orchestration platforms
  • Develop and manage automation pipelines for provisioning, configuration management, and monitoring using modern DevOps toolchains (Ansible, Terraform, etc.)
  • Operate and maintain distributed networking meshes across multiple classified and unclassified domains
  • Implement and manage out-of-band management tools (IMPI, iDRAC, BMC, etc.) for remote troubleshooting and control
  • Integrate and optimize NVIDIA GPU infrastructure for AI/ML training and inference workloads
  • Collaborate with mission engineers, software teams, and government operators to ensure system readiness and performance
  • Provide on-site technical leadership for deployments, troubleshooting, and continuous improvement
  • Mentor junior engineers and establish operational best practices across the ComputeBridge program as the contract grows
What You'll Bring
  • 3+ years of experience in site reliability, systems engineering, or hardware operations roles
  • Deep expertise with physical infrastructure: server racking, cabling, diagnostics, and troubleshooting
  • Strong experience with Linux systems administration, imaging, and automated deployment
  • Hands-on experience managing large-scale clusters or distributed systems in OpenShift or Kubernetes environments
  • Familiarity with DevOps automation (Ansible, Terraform, CI/CD pipelines)
  • Experience configuring and managing networking and mesh architectures
  • Direct experience with NVIDIA GPUs, CUDA, and related AI/ML frameworks
  • Proficiency with out-of-band management and IMPI/iDRAC tooling
  • Certifications: Linux+ and Security+ (required or in-progress)
  • Excellent communication, documentation, and problem-solving skills
  • Clearance: Active TS/SCI required or ability to obtain
Bonus Points For
  • Experience operating in secure DoD or intelligence environments
  • Familiarity with Palantir platforms or other government data systems
  • Prior experience supporting AI/ML infrastructure in production or tactical settings
  • Experience with performance tuning and monitoring of HPC or GPU-accelerated clusters
General Factors
  • Depending on project requirements, may be required to work within a compressed schedule; overtime should be expected when schedules demand it.
  • Willing to travel, if needed.
  • No Relocation.
Why Bridge Defense
  • Shape how advanced computing supports national security missions at scale
  • Lead engineering for a major government program with direct mission impact
  • Competitive compensation, benefits, and growth opportunities in a mission-driven environment

Bridge Defense is committed to building a collaborative and mission-focused team. Bridge Defense reserves the right to modify job duties or requirements at any time. Employment with Bridge Defense is at-will. Candidates must be eligible to work in the United States and complete any required background checks or security clearance processes as a condition of employment.

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Lead Site Reliability Engineer in Washington DC vacancy
  •  ...Site Reliability Engineer Qualifications: 10+ years of overall experience in IT including, with hands-on Development and Systems engineering...  ...part of our Cloud journey and as a member of our team help lead Software automation and reliability for our platform... 
    Suggested
    Temporary work
    Immediate start

    Samprasoft

    Washington DC
    15 hours ago
  • $135k - $150k

     ...Focused Expertise: From veteran leadership to cleared engineers, our people understand both the technology and the...  ...Summary Bridge Defense seeks a highly qualified Site Reliability Engineer to build and lead the company's deployment engineering function responsible... 
    Suggested
    Relocation
    Flexible hours

    Bridge Defense

    Washington DC
    2 days ago
  • $175k - $195k

     ...Filevine Sr. Observability Engineer Filevine is a Legal AI company delivering Legal...  ...# Define and manage SLIs, SLOs, and reliability metrics. # Lead incident response, postmortems, and...  ...infrastructure, or operations. #5+ years of Site Reliability Engineering experience.... 
    Suggested
    Full time
    Temporary work

    Filevine

    Washington DC
    15 hours ago
  •  ...Senior Site Reliability Engineer United States About OfficeSpace: OfficeSpace Software provides the leading AI operating system for the built world, that helps teams plan, connect, and perform in the workplace. As a performance-based, PE-backed company, we hire... 
    Suggested
    Shift work

    OfficeSpace Software

    Washington DC
    2 days ago
  • $112k - $179k

     ...Role Peraton is seeking a self-driven and resourceful Site Reliability Engineer to join our dynamic of Network and UC engineers in Washington...  ...to the farthest reaches of the galaxy. As the world's leading mission capability integrator and transformative enterprise... 
    Suggested
    Contract work
    Worldwide
    Shift work

    Peraton

    Washington DC
    15 hours ago
  • $131k - $227.13k

     ...Description: The 1LMX MES COE is seeking an engineer who will own infrastructure‑as‑code, cloud platform, and reliability for the Apriso environment on AWS. This role blends full‑stack development, DevOps, and Site Reliability Engineering (SRE) practices to deliver a... 
    Full time
    Temporary work
    Work experience placement
    Work at office
    Remote work
    Relocation
    Flexible hours
    Shift work
    3 days per week

    Lockheed Martin Corporation

    Bethesda, MD
    15 hours ago
  • $160k - $180k

     ...Site Reliability Engineer Location: Hybrid – Washington DC/Virginia/Maryland metro with the ability to travel to Patuxent River, MD, as needed (up to 20% of the time). Compensation: $160,000 - 180,000 per year, depending on experience and qualifications. Employment... 
    Full time
    Temporary work
    Local area
    Remote work
    Flexible hours

    Fortress Information Security

    Washington DC
    2 days ago
  • $83k - $187k

     ...management. Description We are looking for a Senior Site Reliability Engineer to join our OCI team. This role is part of a globally distributed...  ...future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives... 
    Temporary work
    Work experience placement
    Flexible hours

    Oracle

    Washington DC
    5 days ago
  •  ...Sr. Site Reliability Engineer (SRE) III As a Sr. Site Reliability Engineer (SRE) III, you'll work as part of a collaborative and high-performing...  ...solutions tied to Service Level Objectives ~ Experience leading or participating in incident response, root cause analysis,... 
    Immediate start

    Mount Indie

    Washington DC
    3 days ago
  •  ...Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the guardian...  ...Management: Act as a primary responder in on-call rotations, leading the technical resolution of production outages.... 
    Local area

    Tiger Analytics

    Washington DC
    3 days ago
  • $100.2k - $203.4k

     ...Site Reliability Engineer At Accenture Federal Services, nothing matters more than helping the US federal government make the nation stronger...  ...enterprise AI systems within a modern Hub-and-Spoke architecture Lead incident response efforts to minimize downtime and maintain... 

    Accenture Federal Services

    Arlington, VA
    5 days ago
  • $51.9 per hour

     ...This job is responsible for the reliability, availability, and...  ...efficiency. This role blends software engineering, clinical engineering, and...  ...cross-functionally with AHN site leaders and teams to navigate...  ...drills and exercises, as needed. Leads or participates in post-... 
    For contractors
    Local area

    Highmark Health

    Washington DC
    4 days ago
  •  ...Site Reliability Engineer (SRE) Randstad is seeking a skilled and proactive Site Reliability Engineer (SRE) to join our client in the Washington D.C. area, focusing on optimizing the availability, performance, and scalability of critical production services. The ideal... 

    Software Technology Inc

    Washington DC
    5 days ago
  •  ...Site Reliability Engineer (SRE) Dexian is seeking a savvy Site Reliability Engineer (SRE) who will play a key role in building a sustainable platform by developing systems for analyzing environments, predicting, and resolving issues, and supporting the production environment... 
    Work experience placement

    Samprasoft

    Washington DC
    15 hours ago
  •  ...Site Reliability Engineer Location- Wilmington De, Washington DC, Dallas, TX (Onsite Position) Full time position Minimum Qualifications Bachelor’s degree in computer science, Engineering, or a related technical field. Minimum of 5 years of experience... 
    Full time

    Yochana

    Washington DC
    2 days ago
  • $114.6k - $190.2k

     ...Site Reliability Engineer (SRE) Unlock the secrets of intelligence with MANTECH! Join a dynamic team at the forefront of national security, providing advanced solutions to government intelligence agencies. Since 1968, we've been solving the toughest challenges with... 
    Hourly pay
    Contract work
    Temporary work
    Work experience placement
    Work at office
    Local area
    Remote work

    ManTech

    Washington DC
    3 days ago
  •  ...Job Title: Site Reliability Engineer (SRE) Location: Washington, DC (Onsite) Clearance: TS/SCI Position Overview Seeking...  ...distributed environments • Perform root cause analysis and lead post-incident reviews • Implement corrective and... 

    Input Technology Solutions

    Washington DC
    5 days ago
  • $109.5k - $150.55k

     ...strive for the best, own our actions, and grow and evolve. Job Description Renaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group's Site Reliability Team with a focus on Application and Infrastructure... 
    For contractors
    Local area
    Remote work
    Worldwide
    Work visa
    Flexible hours
    Weekend work

    Renaissance Services

    Washington DC
    3 days ago
  • $100k - $170k

     ...Site Reliability Engineer Nscale is the GPU cloud engineered for AI—purpose-built to deliver high-performance, cost-efficient infrastructure...  ...plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always... 
    Flexible hours

    Nscale

    Washington DC
    4 days ago
  • $106.3k - $221.1k

     ...Senior Site Reliability Engineer At Accenture Federal Services, nothing matters more than helping the US federal government make the nation stronger and safer and life better for people. Our 13,000+ people are united in a shared purpose to pursue the limitless potential... 
    Live in
    Work at office
    Local area

    Accenture Federal Services

    Arlington, VA
    5 days ago
  • $131k - $164k

     ...Staff Site Reliability Engineer New York, New York, United States Position Overview We are seeking a highly skilled Staff Site Reliability...  ...they need to drive greater impact and accountability – to lead with purpose. Our employees are passionate, smart, and... 
    Work at office
    Local area
    Flexible hours

    Diligent

    Washington DC
    2 days ago
  • $165k - $230k

     ...is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars. SR. SITE RELIABILITY ENGINEER (STARSHIELD) Starshield leverages SpaceX's Starlink technology and launch capability to support national security efforts.... 
    Temporary work
    Immediate start
    Weekend work

    SpaceX

    Washington DC
    2 days ago
  •  ...Principal Site Reliability Engineer The Principal Site Reliability Engineer will be a critical technical leader responsible for driving the...  ...SRE), including defining SLOs, managing error budgets, and leading incident response. You will mentor cross-functional teams,... 

    Software Technology Inc

    Washington DC
    5 days ago
  • $191k - $287k

     ...expectations. Our systems integration engineers internalize the nuances of each deployment...  .... About the Job We are looking for a Site Reliability Engineer (SRE) to join AGD, our rapidly...  ...design and code. They are comfortable leading large, focused projects. They lead in the... 

    Slope

    Washington DC
    3 days ago
  •  ...This role requires regularly working on-site at customer locations in Arlington, VA....  .... About The Role We are hiring a Site Reliability Engineer to join our Infrastructure & Security...  ...found them. You are equally comfortable leading a post-incident review, or diving into... 
    Relocation
    Relocation package

    Onebrief, Inc.

    Arlington, VA
    2 days ago
  • $166k - $220k

    ABOUT THE JOB As a site reliability engineer in Platform Discovery, you will solve a wide variety of problems involving networking, autonomy, systems...  ...deployments Drive continuous organizational improvement by leading post-mortem events involving diverse stakeholders Quickly... 
    Full time
    Work experience placement
    Relocation package

    Slope

    Washington DC
    6 days ago
  • $126k - $248k

     ..., you will partner with SRE leaders and engineers to scale the platform that underpins all...  ...program execution, strengthen production reliability practices, and coordinate cross-...  ...Strengthen Production Reliability - Lead change management and launch readiness programs... 
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    Washington DC
    5 days ago
  •  ...Python, and PowerShell, integrating systems, and managing Microsoft Entra services. A minimum of 5 years of experience in systems engineering is required along with a Bachelor's degree in Computer Science. The position offers a hybrid work model as employees must be... 
    Local area

    Highlighttech

    Washington DC
    3 days ago
  • $106.5k - $177.5k

    The Site Reliability Engineering discipline at Noctua Technology, LLC is a strategic force driving digital transformation. We treat operations as a software engineering challenge, focusing on the seamless integration, scalability, and long-term reliability of cloud native... 
    Full time
    Remote work

    Noctua Technology

    Washington DC
    3 days ago
  • Location: Washington, DC (On-site) Job Type: Full-Time Clearance Requirement: Active TS/SCI (US Citizenship is Required...  ...Analytics is seeking a highly skilled and experienced Lead Agile Coach/Release Train Engineer (RTE) to support a high-profile, mission-critical program... 
    Full time
    Contract work
    Work at office

    Praescient Analytics

    Washington DC
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Lead Site Reliability Engineer. Be the first to apply!