Senior Site Reliability Engineer, Production Engineering
$166k - $220kDormont Manufacturing Co
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril’s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting‑edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.
ABOUT THE TEAM
The Production Engineering team is a newly formed organization within Anduril’s Software Platform, dedicated to ensuring the reliability, performance, and scalability of mission‑critical systems that directly support our warfighters in the field. We solve complex reliability challenges at massive scale, ensuring that critical components of Lattice—Anduril’s autonomous command and control platform—operate flawlessly in the most demanding operational environments. This is a foundational role and you will be among the first hires building this team from the ground up. You’ll have the unique opportunity to shape the technical direction, establish best practices, and define what production engineering excellence means at Anduril. Our team operates at the intersection of software engineering and systems reliability, building the infrastructure, tooling, and processes that keep our systems operational 24/7/365.ABOUT THE ROLE
We are seeking an experienced Senior Site Reliability Engineer who is passionate about building resilient, highly available systems that scale to meet the demands of the core systems powering Lattice. You will work closely with platform engineering teams, product developers, and field operations to proactively identify reliability risks, implement defensive strategies, and continuously improve the operational excellence of our software platform. If you thrive on solving hard problems at scale and want your work to have direct impact on national security, this is the role for you.WHAT YOU’LL DO
Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform Drive incident response and conduct blameless postmortems to identify systemic improvements and prevent recurrence of production issues Build and maintain infrastructure automation using tools like Terraform, Kubernetes operators, and custom tooling to manage large‑scale distributed systems Establish and track Service Level Objectives (SLOs) and Error Budgets to balance feature velocity with system reliability Partner with software engineering teams to improve system architecture for reliability, implementing patterns like circuit breakers, graceful degradation, and chaos engineering Develop capacity planning models and performance testing frameworks to ensure systems can handle growth and peak operational demands Create runbooks, documentation, and training materials to enable teams to operate production systems effectively Lead cross‑functional efforts to improve deployment safety through progressive rollouts, automated testing, and rollback capabilities Implement security best practices and compliance controls for production environments handling sensitive defense data Build tooling and automation to reduce toil and improve operational efficiency for the engineering organization Participate in on‑call rotations and serve as an escalation point for critical production incidentsREQUIRED QUALIFICATIONS
7+ years of engineering experience with at least 3+ years focused on SRE, production operations, or infrastructure engineering Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience Deep expertise with Kubernetes in production environments, including operational challenges at scale (100+ nodes) Strong programming skills in one or more languages such as Go, Python, Rust, or Java with ability to build production‑grade tooling Proven experience designing and implementing observability stacks (metrics, logging, tracing) using tools like Prometheus, Grafana, ELK/EFK, or equivalent Hands‑on experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code practices Demonstrated ability to debug complex distributed systems issues across multiple layers of the stack Track record of improving system reliability through architectural changes, not just operational band‑aids Strong incident management and communication skills, with experience leading responses to critical outages Must be a U.S. Person due to required access to U.S. export controlled information or facilities Eligible to obtain and maintain an active U.S. Secret security clearancePREFERRED QUALIFICATIONS
Experience with defense, aerospace, or other mission‑critical systems where downtime has severe consequences Expertise in performance optimization and capacity planning for high‑throughput, low‑latency systems Knowledge of chaos engineering principles and experience implementing resilience testing frameworks Experience with service‑mesh technologies (Istio, Linkerd) and advanced traffic management patterns Background in database operations and optimization (PostgreSQL, Cassandra, or similar at scale) Familiarity with CI/CD platforms and deployment automation (ArgoCD, FluxCD, Spinnaker, Jenkins) Understanding of networking fundamentals including load balancing, DNS, TLS/SSL, and network security Experience with configuration management and secrets management solutions (Vault, Sealed Secrets, SOPS) Strong written and verbal communication skills with ability to explain technical concepts to non‑technical stakeholders Active Secret or higher security clearance US Salary Range$166,000—$220,000 USD
The salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full‑time offers; and are considered part of Anduril’s total compensation package. Additionally, Anduril offers top‑tier benefits for full‑time employees, including: Healthcare Benefits US Roles: Comprehensive medical, dental, and vision plans at little to no cost to you. UK & AUS Roles: We cover full cost of medical insurance premiums for you and your dependents. IE Roles: We offer an annual contribution toward your private health insurance for you and your dependents. Additional Benefits Income Protection: Anduril covers life and disability insurance for all employees. Generous time off: Highly competitive PTO plans with a holiday hiatus in December. Caregiver & Wellness Leave is available to care for family members, bond with a new baby, or address your own medical needs. Family Planning & Parenting Support: Coverage for fertility treatments (e.g., IVF, preservation), adoption, and gestational carriers, along with resources to support you and your partner from planning to parenting. Mental Health Resources: Access free mental health resources 24/7, including therapy and life coaching. Additional work‑life services, such as legal and financial support, are also available. Professional Development: Annual reimbursement for professional development Commuter Benefits: Company‑funded commuter benefits based on your region. Relocation Assistance: Available depending on role eligibility. Retirement Savings Plan US Roles: Traditional 401(k), Roth, and after‑tax (mega backdoor Roth) options. UK & IE Roles: Pension plan with employer match. AUS Roles: Superannuation plan. The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process. To view Anduril’s candidate data privacy policy, please visit #J-18808-Ljbffr Dormont Manufacturing Co- Blue Origin is seeking a Site Reliability Engineer to enhance the digital infrastructure supporting safe human spaceflight. This role involves improving the software lifecycle from design to deployment, particularly in cloud environments. The ideal candidate will possess...Senior
$79.1k - $158.2k
...according to terms for reliability and functionality.... ...investigations, and debugging products in order to reach SLOs... ...basic knowledge of site reliability trends and... ...escalate issues to senior team members. Collects... ...skilled Site Reliability Engineer to design, build,...SeniorTemporary workImmediate startFlexible hoursShift work- ...APPIT Software Solutions is hiring a Senior Site Reliability Engineer (SRE) in Seattle, USA . Lead site reliability engineering efforts for large-... ...Responsibilities Lead SRE strategy and practices across multiple product teams ensuring consistent reliability standards Architect...SeniorFlexible hours
- ...about this role, we encourage you to apply. The Role As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and... ...company goals are met. What You Will Be Doing Improving production reliability and system resilience within an SRE scoped team...SeniorFlexible hours
$139.5k - $258.1k
...Software and Services Apple Services Engineering team is one of the most exciting examples... ...Service Infrastructure team, as a Site Reliability Engineer, to help support and scale cloud... ...benefits, a range of discounted products and free services, and for formal education...SeniorRelocation$150k - $180k
...improve cloud infrastructure reliability, scalability, and operational efficiency. Write production-quality code in Go, Python, or... ...platforms and tools that enable engineering teams to provision services... ...engineering, cloud infrastructure, or site reliability engineering....Senior$134.25k - $214.8k
...of devices and cloud software. Like our products, we work better together. We connect... ...where you matter. Your Impact Are you an engineer who gets excited about the challenge of... ...the Observability team within Axon's Site Reliability organization - a focused team responsible...SeniorWork experience placementWork at officeRemote work$165k - $242k
...Senior Site Reliability Engineer, Data Infrastructure CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers... ..., and internal AI workloads at scale. We operate with production-grade discipline, supporting mission-critical services...SeniorPermanent employmentTemporary workCasual workWork at officeFlexible hours$127k - $249k
Senior / Staff Engineer - SRE, InfraSec We are looking for an experienced Senior or Staff Engineer for our SRE, InfraSec team to guide the security of our cloud‑based infrastructure. You will be highly hands‑on technically while also mentoring a small team of SREs. The...SeniorLocal areaRemote work- Next Frontier Capital is seeking a Site Reliability Engineer III to drive innovation within technology sectors, ensuring the reliability and scalability of applications and infrastructure. You will manage and optimize cloud resources, ensuring best practices in site reliability...Senior
$70 - $80 per hour
...leading organization in the technology sector, is seeking a Sr. Site Reliability Engineer to join their team. As a Sr. Site Reliability Engineer,... ...teams deploy and operate containerized services in production AKS environments Design, write, and maintain Terraform...SeniorWeekly payTemporary workLocal areaFlexible hours3 days per week$175k - $200k
...for you. About the Role: As a member of the Product and Engineering team at PitchBook, you will be part of a team of big... ...improve. Join our team and grow with us! As a Sr. Site Reliability Engineer (SRE) in PitchBook's engineering division, you will...SeniorWork at officeRemote workVisa sponsorship- ...Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the guardian of our production ecosystems, ensuring that our complex, data-driven AI platforms remain resilient, scalable, and highly performant...SeniorLocal area
- The Consulting Solutions is seeking an experienced Senior / Staff Engineer for our SRE, InfraSec team in Seattle. The role involves leading the security of cloud-based infrastructure, mentoring a team of SREs, and collaborating with other engineering teams to ensure high...SeniorRemote job
$192k - $240k
### Senior Security Operations Engineer#### Seattle, Washington, United StatesSenior Security Operations Engineer**Why join us**Brex is the AI-powered spend platform. We help companies spend with confidence with integrated corporate cards, banking, and global payments,...SeniorWork at officeRemote workWork from home- Axon in Seattle is seeking a Senior Engineer for its observability team. You'll design and evolve the observability platform, working on distributed tracing, logging, and metrics across Axon's infrastructures. The ideal candidate has strong engineering experience, ideally...Senior
$106.8k - $194.8k
...WAF Operations Solution Engineer Location: Anywhere in Country Practice Description As a WAF Operations Solution Engineer, you will be responsible for implementing and managing Web Application Firewall (WAF) solutions to protect client applications from cyber threats....SeniorSummer holidayFlexible hours$171.6k - $302.2k
A leading technology company in Seattle is seeking an experienced Site Reliability Engineer to enhance compute infrastructure at scale. You will design and implement innovative solutions, manage cloud infrastructure, and focus on automation. The ideal candidate has over...Senior$177.57k - $248.59k
Site Reliability Engineering - Sr. Software Development Engineer Implement and manage the infrastructure for rapid development and deployment of... ..., and refinement Support services before they go to production through activities such as system design, consulting, developing...SeniorPermanent employmentTemporary workLocal area- ...Senior Automation Engineer Location: Seattle, WA Visa: GC or Citizen Or H1B Note: Its a Senior position need 10 or above resume only,... ...experience working in Operations, Engineering, DevOps, or Site Reliability in a medium to large company. ~ Can describe specific...SeniorH1b
- Jansoft Global is looking for an experienced SAP Test Automation Engineer in Bellevue, United States. This role involves designing, developing, and executing automated test solutions for enterprise SAP applications using Worksoft Certify. Key responsibilities include end...Senior
$191k - $253k
...Corporate Technology Engineering team is responsible for... ...seeking a highly motivated Senior Software Engineer to... ...time, driving our production lines efficiently. You... ...performance, scalability, and reliability of the Forge platform... ...work with the team on-site on a rotation for...SeniorFull timeWork experience placementImmediate startRotating shift- Blue Origin LLC is seeking a Training Engineer Level III to join their In-Space Systems team in Seattle. This role involves designing and executing training for flight operations, ensuring effective mission support. Applicants should possess a strong engineering background...Senior
- ...support first responders in saving lives. About this Role: As our lead Site Reliability / DevOps Engineer, you will own the reliability, scalability, and operational excellence of our production systems. You’ll build secure cloud infrastructure, automation, and deployment...SeniorWork at officeWorldwideFlexible hours
- Blue Origin LLC is seeking a Senior Mission Operations Engineer to lead and execute on-orbit operations. This role requires a strong operator focus, effective leadership, and the ability to develop mission plans while streamlining workflows. The ideal candidate will have...Senior
- ...GEICO is looking for a talented Engineer in Seattle to enhance our network infrastructure and implement security measures in alignment with our tech transformation goals. You will lead the strategy and execution of technical roadmaps, ensuring network performance and...Senior
$107.1k - $160.7k
...the world's leading integrated design practice. Our architects, engineers, interior designers, consultants, sustainability specialists,... ...your place with Stantec. Your Opportunity The Senior Automation Engineer for BAS/BMS/PLC systems, guides the technical...SeniorFull timeTemporary workPart timeCasual workLocal areaFlexible hours- Compass is seeking a Senior Software Engineer to join their Staff AI Enablement Team in Seattle. In this role, you will drive the company's enterprise AI strategy by collaborating with major business units to create automated workflows. Your responsibilities involve building...Senior
$148.5k - $237.6k
...ecosystem of devices and cloud software. Like our products, we work better together. We connect with... ...where you matter. Your Impact As a Senior Security Operations Engineer, you'll play a key role in ensuring the reliability, performance, and scalability of our...SeniorWork experience placementWork at officeRemote work- ...Ziply Fiber in Kirkland, WA is seeking a Principal Network Automation Engineer responsible for planning, designing, and implementing software-driven solutions to enhance its fiber and IP networks. This role emphasizes automation to improve operational efficiency and reduce...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer, Production Engineering. Be the first to apply!
- site reliability engineer sre Seattle, WA
- site reliability engineer Seattle, WA
- production operations engineer Seattle, WA
- application operations engineer Seattle, WA
- data center operations engineer Seattle, WA
- remote operation drilling engineer Seattle, WA
- security operations center engineer Seattle, WA
- cloud operations engineer Seattle, WA
- production network engineer Seattle, WA
- senior security operations engineer Seattle, WA


