Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Sr Site Reliability Engineer

Commence

Job Description

Job Description

Description:

At Commence, we’re the start of a new age of data-centric transformation, elevating health outcomes and powering better, more efficient process to program and patient health. We combine quality data-driven solutions that fuel answers, technology that advances performance, and clinical expertise that builds trust to create a more efficient path to quality care.

With human-centered, healthcare-relevant, and value-based solutions, we create new possibilities with data. We provide proof beyond the concept and performance beyond the scope with a focus on efficiencies that transform the lives of those we serve. With a culture driven by purpose, straightforward communication and clinical domain expertise, Commence cuts straight to better care.

Requirements:

As a Senior Site Reliability Engineer at Commence, you will own the reliability, scalability, and operational health of our mission-critical healthcare data platform. You will bridge the gap between engineering and operations—embedding reliability as a first-class concern from architecture through deployment. This role is built for someone who thrives when systems are under pressure and who treats an outage as a problem to be engineered away permanently, not just survived.

  • Design, implement, and own observability infrastructure including metrics, logging, tracing, and alerting across distributed systems.
  • Define and enforce SLOs, SLIs, and error budgets in partnership with product and engineering teams.
  • Lead incident response: triage, coordinate remediation, conduct blameless post-mortems, and drive systemic fixes.
  • Build and maintain CI/CD pipelines that support rapid, safe delivery of changes to production.
  • Collaborate with engineering teams on infrastructure changes; able to read, modify, and contribute to existing infrastructure-as-code (Terraform or CloudFormation).
  • Design and operate highly available, fault-tolerant systems—including auto-scaling, failover, and disaster recovery strategies.
  • Reduce operational toil through automation; eliminate manual processes before they become habits.
  • Collaborate with software engineers to establish reliability-first design patterns and review architectures for operational risk.
  • Manage Kubernetes or container orchestration environments at scale.
  • Ensure systems meet compliance and security requirements, particularly those applicable to healthcare data (HIPAA, SOC 2).
  • Provide technical mentorship and guidance to engineers across the organization on reliability practices.
  • Participate in on-call rotation with a commitment to continuously reducing the need for it.

Qualifications

  • 7+ years of experience in SRE, platform engineering, or DevOps roles.
  • Exceptional problem-solving under pressure—demonstrated track record of diagnosing complex, high-stakes system failures and building durable solutions.
  • Deep hands-on experience with AWS services including EC2, EKS/ECS, Lambda, RDS, S3, CloudWatch, and related tooling.
  • Familiarity with infrastructure-as-code (Terraform or CloudFormation)—able to contribute to existing configurations.
  • Experience designing and operating distributed systems with strict availability and latency requirements.
  • Proficiency in at least one scripting or systems language (Python, Go, Bash, or similar) for automation and tooling.
  • Experience with container orchestration (Kubernetes, ECS) in production environments.
  • Expertise in observability tooling (OpenSearch, Prometheus/Grafana, or equivalent).
  • Hands-on experience with CI/CD platforms (GitHub Actions, Jenkins, CircleCI, or similar).
  • Proven ability to define and operationalize SLOs and error budgets.
  • Experience with relational and NoSQL databases—performance tuning, replication, and backup strategies.
  • Strong working knowledge of networking fundamentals: DNS, load balancing, VPCs, TLS.
  • Excellent communication skills—able to translate technical risk into business impact for non-engineering stakeholders.

Additional Requirements

  • AWS Certifications (Solutions Architect, DevOps Engineer, or SysOps Administrator).
  • Experience in healthcare technology or other regulated industries (HIPAA, SOC 2, FedRAMP).
  • Familiarity with chaos engineering practices and tooling.
  • Experience with data pipeline reliability (ETL/ELT workflows, streaming systems).
  • Exposure to AI/ML infrastructure and the reliability challenges unique to model serving.
  • Familiarity with additional cloud platforms (Azure, Google Cloud).
  • Contributions to open-source reliability or infrastructure tooling.

Work Environment/Physical Demands

The work environment and physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

This is a remote position. While performing the duties of this job, the employee regularly works in a climate-controlled environment. Candidates must be able to sit, read, work on a computer, and watch a computer screen for extended periods of time. Occasionally required to stand, walk, use hands and fingers, kneel or crouch.

Commence is an equal employment opportunity employer. All personnel processes are merit-based and applied without discrimination on the basis of race, color, religion, sex, sexual orientation, gender identity, marital status, age, disability, national or ethnic origin, military and veteran status or any other characteristic protected by applicable law.

Commence.AI is committed to providing equal employment opportunities to all applicants, including individuals with disabilities. If you require a reasonable accommodation to participate in the application process due to a disability, please contact Human Resources at View phone number on ziprecruiter.com or View email address on ziprecruiter.com. Please note that unless you are requesting an accommodation, all applications must be submitted through our online application system.

Vacancy posted 23 days ago
Similar jobs that could be interesting for youBased on the Sr Site Reliability Engineer in Leesburg, VA vacancy
  • $105k - $165k

     ...digital vetting and analysis tools possible in an easy-to-use, modern software experience. WHAT YOU’LL DO As a Senior Software Engineer at Chess Solutions LLC, you will serve as the principal developer of a digital vetting platform. You will be responsible for... 
    Senior
    Full time
    Contract work
    Work at office
    Local area
    Remote work

    CHESS Solutions, LLC

    Leesburg, VA
    4 days ago
  •  ...Software Engineer The Software Engineer is responsible for designing, developing, testing, and maintaining software applications to meet...  ...languages, frameworks, and business systems to deliver reliable, scalable, and well‑documented solutions. Key Responsibilities... 
    Suggested
    Full time
    Part time
    Flexible hours

    Business System Solutions Inc

    Leesburg, VA
    1 day ago
  • $200k

     ...opportunities from prospect to closure. Ability to provide live and web-based demonstrations of the Cofense solution without sales engineer support (though SE support is available as needed) Excellent negotiation, presentation and communication skills Capability... 
    Senior
    Local area
    Remote work

    COFENSE

    Leesburg, VA
    1 day ago
  •  ...Sr Enterprise Account Executive (Remote UK, DACH Region) Reporting to the VP, International Sales, the Account Executive is responsible...  ...web-based demonstrations of the Cofense solution without sales engineer support. The role also requires excellent negotiation,... 
    Senior
    Remote job

    Cofense

    Leesburg, VA
    4 days ago
  •  ...Senior Software Engineer – Middleware & Backend Services Nightwing provides technically advanced full‑spectrums of cyber, data operations, systems integration and intelligence mission support services to meet our customers’ most demanding challenges. Our capabilities... 
    Senior

    Nightwing Intelligence Solutions, LLC

    Sterling, VA
    2 days ago
  •  ...Join Schneider Electric as a System Application Engineer and embark on an exciting opportunity to provide support to our customers in the dynamic field. Are you ready to make an impact? Join us and be part of something extraordinary! What will you do? Troubleshoot... 
    Full time
    Contract work
    For subcontractor

    Schneider Electric

    Leesburg, VA
    4 days ago
  • $82.1k - $172.4k

    Job Title: Sr Full Stack Software Developer Job Category: Engineering Time Type: Full time Minimum Clearance Required to Start: TS/SCI with Polygraph Employee Type: Regular Percentage of Travel Required: Up to 10% Type of Travel: Local Anticipated Posting End: 9/20/202... 
    Senior
    Full time
    Contract work
    Work experience placement
    Local area
    Flexible hours

    CACI International

    Sterling, VA
    1 day ago
  •  ...Leesburg, VA, United States Be the First to Apply Job Description YOUR ROLE As a part-time Sr Brand Ambassador, you bring our brand to life every day! Your goal is to provide everyone who comes in the store with an amazing shopping experience, making their day better... 
    Senior
    Full time
    Part time
    Seasonal work
    Local area
    Flexible hours
    Weekend work
    Afternoon shift

    Unsubscribed

    Leesburg, VA
    14 hours ago
  • $125.3k - $175k

     ...Principal Embedded Software Engineer - Onsite Locations: Ashburn, VA / Boston, MA / Nashua, NH Salary Range: $125,300.00 - $175,000.00 per year plus bonus (general guideline). Responsibilities Plan, design, develop, and test new embedded computing software or complex... 

    DY4 Curtiss-Wright DS, Inc.

    Ashburn, VA
    1 day ago
  •  ...CUDA technologies to keep Dedrone’s systems at the cutting edge. What You Bring Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field — or equivalent experience. Strong C++ programming skills (modern C++ preferred: C++14/17/20). Solid... 
    Senior
    Work at office
    Remote work

    Menlo Ventures

    Sterling, VA
    1 day ago
  • A reputable HVAC service provider is seeking an experienced HVAC Service Technician to provide service to Commercial HVAC customers. The role requires a minimum of 4 years experience, strong troubleshooting skills, and the ability to communicate effectively with clients...
    Senior

    Shapiro & Duncan

    Leesburg, VA
    14 hours ago
  • $80k - $114k

     ...A leading automotive IT firm in Virginia is looking for a Security Engineer III to safeguard their information systems. The role involves implementing a range of security technologies and requires expertise in network administration and security operations. The ideal... 
    Senior

    Hyundai AutoEver America

    Leesburg, VA
    14 hours ago
  •  ...Join us to build the future. Your Impact As an Embedded Software Engineer, you’ll design and develop robust embedded solutions for...  ...electronics applications—ensuring our systems deliver precision, reliability, and scalability to support next-generation infrastructure. Your... 
    Senior
    Work at office
    Flexible hours
    Night shift

    Accelevation LLC

    Sterling, VA
    14 hours ago
  •  ...Overview REI Systems’ mission is to deliver reliable, innovative technology solutions that advance Federal clients' missions and exceed...  ...solutions, and AI-enabled enterprise architectures. The Principal AI Engineer provides strategic and hands‑on technical leadership to... 
    Remote work

    REI Systems

    Sterling, VA
    14 hours ago
  • TMG Construction in Leesburg, Virginia is looking for a Superintendent II to oversee daily field operations for construction projects worth $5-8 million. The role demands strong communication skills and a robust knowledge of construction disciplines, focusing on safety...
    Senior
    Contract work
    For subcontractor

    TMG Construction Corporation

    Leesburg, VA
    3 days ago
  • $131.3k - $237.35k

     ...and staffed with talent who have built, overseen, and enhanced capabilities throughout the entire USG arsenal. Our team of hackers, engineers, makers, and shakers have expe­rience spanning centuries of research, development, and oper­ations missions - across desktop,... 
    Senior
    Local area
    Immediate start

    Leidos

    Sterling, VA
    4 days ago
  •  ...Software Engineer – Developer and Production Systems Support page is loaded## Software Engineer...  ...Poly clearance## Location: Customer Site – Herndon, Virginia## ## Key Job...  ...architecture, ensuring high availability, reliability, and performance of deployed systems in... 
    Work experience placement
    Immediate start
    Remote work

    Nightwing Group

    Sterling, VA
    14 hours ago
  •  ...drive real change. Constantly grow as you work hard for a mission that matters at a company where you matter. Your Impact As an engineering member on the team, you will make key design decisions that will shape our newest products. This platform will offer law enforcement... 
    Internship
    Work at office
    Remote work
    Flexible hours

    Jobr

    Sterling, VA
    1 day ago
  •  ...across AWS accounts. Identify defects, debug, and monitor applications. Qualifications: Master’s degree in Science, Technology, or Engineering (any) is required. Work location: Ashburn, VA and various unanticipated locations throughout the U.S. Email Resume to HR Dept.,... 
    Full time

    IT Minds

    Ashburn, VA
    14 hours ago
  •  ...environment. In this position, you will design and implement software solutions independently while working closely with systems engineering, product delivery, and DevSecOps teams. The primary focus is on developing containerized applications within an Agile/SAFe environment... 
    Remote work

    Fairygodboss

    Ashburn, VA
    1 day ago
  •  ...Data Pulse Tech LLC is a Virginia-certified small business headquartered in Ashburn, VA. We specialize in cybersecurity, software engineering, and DevSecOps solutions for federal agencies and defense contractors. Our team of professionals delivers mission-critical... 
    For contractors

    Data Pulse Tech LLC

    Ashburn, VA
    14 hours ago
  • About Nightwing Nightwing provides technically advanced full‑spectrum cyber, data operations, systems integration and intelligence mission support services to meet our customers’ most demanding challenges. Our capabilities include cyber space operations, cyber defense ...
    Flexible hours

    Nightwing Intelligence Solutions, LLC

    Sterling, VA
    14 hours ago
  •  ...clearance is required Please contact us for an updated list. Systems Engineer (SE) to provide media analysis capabilities utilizing computer...  ...for robustness, including edge cases, usability, and general reliability. Work on bug fixing and improving application performance.... 
    Full time
    Work at office

    Sogea Technology LLC

    Sterling, VA
    1 day ago
  •  ...Solving : Identify and troubleshoot issues within applications, finding effective solutions. Quality Assurance : Test software for reliability, performance, and security, making necessary improvements. Teamwork : Collaborate with cross-functional teams, including... 

    Quantoknack

    Ashburn, VA
    14 hours ago
  • Inova Loudoun Hospital is looking for a dedicated Sterile Processing Technicians Lead to join the OR team. This role will be full-time (40 hrs/week) Day shift working Monday to Friday and every other weekend from 7:00am-3:30pm. We offer great pay based on experience, plus...
    Full time
    Monday to Friday
    Shift work
    Day shift

    Inova Health System

    Leesburg, VA
    14 hours ago
  •  ...such as AWS Solutions Architect, AWS Developer, Linux+, Security+, CCNA. Required Education Bachelor of Science Degree in a related engineering field; or 6 additional years of Engineering, Computer, or Information Science applicable professional experience in lieu of a... 

    Nightwing Intelligence Solutions, LLC

    Sterling, VA
    1 day ago
  • $28 - $36 per hour

     ...potential, and a professional team environment built on trust, reliability, and customer care. Your responsibilities: Perform high-quality...  ...clearly with customers and help manage expectations on-site Work independently while maintaining safety, quality, and consistency... 
    Senior
    Full time
    Work at office
    Local area
    Work from home
    Flexible hours

    TruBlue Home Service Ally

    Leesburg, VA
    4 days ago
  •  ...Sr. Administrative Assistant Lansdowne, VA Make a difference with us! At Alliance Defending Freedom, we believe God has brought you here for His purposeto stand for truth and defend the God-given right to live and speak it. Together, we protect religious freedom... 
    Senior
    Work at office

    Alliance Defending Freedom

    Leesburg, VA
    2 days ago
  • $114.6k - $252.1k

     ...Senior Systems Engineer (MBSE) Opportunity The MBSE Systems Engineer will be responsible for applying model‑based systems engineering (MBSE) approaches throughout the lifecycle of program solutions to ensure that design, development, and implementation meet all necessary... 
    Senior

    CACI International

    Sterling, VA
    1 day ago
  •  ...supporting the nation's cybersecurity defenders by ensuring they have reliable, scalable, and secure cloud infrastructure available within...  ...America’s digital infrastructure when it matters most. DevOps Engineers will automate, optimize, and enhance the deployment and... 
    Contract work
    Local area

    Nightwing Intelligence Solutions, LLC

    Sterling, VA
    14 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Sr Site Reliability Engineer. Be the first to apply!