Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Systems Engineer - Site Reliability Engineering

Full-time

Marriott

JOB SUMMARY:

The Systems Engineer - Site Reliability Engineering (SRE) is responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of Marriot customers globally. This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives. Partners closely with Product Teams, Applications teams, Infrastructure, and the broader Applications and Infrastructure Delivery teams to develop key metrics and KPIs to improve applications stability, availability and performance. The ideal candidate will bring strong communication skills, collaborating with key stakeholders across the company to optimize cloud infrastructure and uphold the highest standards of operational excellence in a dynamic, fast-paced environment.

CANDIDATE PROFILE:

Required: * Undergraduate degree in an engineering or computer science discipline and/or equivalent experience/certification * 5+ years of hands-on experience in designing, building and operating production grade systems including: * 2+ years of experience as a Site Reliability Engineer (SRE), building and managing highly available and mission critical systems * Deep understanding of SRE practices, such as Service Level Objectives, Error Budgets, Toil Management, Observability & Monitoring, Blameless Postmortems, Incident Response Process, Capacity Planning * Expertise in AWS services including designing highly available, multi-AZ and multi-region architectures, for example:
  • Compute: EC2, Auto Scaling, Lambda
  • Containers: EKS (Mandatory), ECS (good to have)
  • Networking: VPC, subnets, route tables, NAT gateways, Transit Gateway
  • Security: IAM roles/Policies, KMS, Secret manager
  • Storage and Databases: S3, EBS, EFS, RDS, DocumentDB.
  • Proven automation and programming experience in one or more of the following
languages: Python, PowerShell * Experience using modern, continuous development techniques and pipelines (e.g. Agile, Kanban, Jira, CI/CD, Helm, Harness, Jenkins, Git, Artifactory, Vault) * Experience designing and implementing end-to-end observability solutions across metrics, logs, and traces using tools like Prometheus, Grafana, ELK Stack, and OpenTelemetry. * Hands-on experience with Linux administration (RHEL, Ubuntu, CentOS, AWS Linux) * Experience troubleshooting API-related issues in distributed systems, including latency, authentication/authorization failures, rate limiting, and upstream/downstream dependency failures. * Experience with containerization orchestration engines such as Kubernetes

(EKS, AKS, ACK)

* Familiarity with service mesh technologies to enable secure and resilient service communication, including mTLS, traffic shaping, and policy enforcement. * Familiarity with Infrastructure as Code (Iac) tools like Terraform and CloudFormation. * Familiarity with configuration management and automation tools such as Ansible. * Familiarity with vulnerability management, OS hardening, patching, security compliance of infrastructure, applications and databases * Understanding of basic networking fundamentals Preferred: * Experience driving cloud cost optimization initiatives (rightsizing, reserved instances, autoscaling strategies, cost observability) * Networking expertise including Load Balancing, Firewalls, Security Groups, NACLs, TCP/IP, DNS, SSL/TLS etc

CORE WORK ACTIVITIES:

* Ensure the reliability, availability, and performance of mission-critical cloud services, implementing best practices for monitoring, alerting, and incident management. * Oversee the management of high-severity incidents, driving quick resolution and post-incident analysis to identify root causes and prevent recurrence. * Drive the automation of operational processes and ensure systems can scale effectively to support growing user demand, optimizing cloud and on-prem infrastructure and resource usage. * Develop and execute the SRE strategy aligned with business goals, and communicate service health, reliability, and performance metrics to senior leadership and stakeholders Drive Applications Performance Management and Monitoring:
  • Assess application architectures to identify key monitoring points
  • Identify Key Performance Indicators, apply monitoring, and report out on
compliance.
  • Gather information to develop reporting metrics and KPIs
  • Ensure that all applications adhere to appropriate monitoring standards based
on their technology/business process * Determine forums and cadence to provide regular monitoring updates Building Successful Relationships: * Collaborates with Enterprise Application and Architecture and Infrastructure teams to continuously improve processes and procedures. * Liaises with vendors and Service Providers to select services and tools that best meet company goals Managing Projects and Priorities: * Develops specific goals and plans to prioritize, organize, and accomplish work.
  • Champions leaders’ vision for product and service delivery.
  • Executes the necessary decisions to keep moving forward toward achievement of
goals. * Determines priorities, schedules, plans and necessary resources to promote completion of any projects on schedule. Delivering on the Needs of Key Stakeholders:
  • Understands and meets the needs of key stakeholders.
  • Communicates concepts in a clear and persuasive manner that is easy to
understand.
  • Demonstrates an understanding of business priorities.
  • Supports achievement of performance goals, budget goals, team goals, etc.
Providing Technical Support and Consultation:
  • Provides technical expertise within own and other teams.
  • Provides recommendations to improve the effectiveness of processes and
programs. * Demonstrates advanced knowledge of job-relevant issues, products, systems, and processes.
  • Keeps up-to-date technically and applies new knowledge to job.
  • Performs other reasonable duties as required for this position.
At Marriott International, we are dedicated to being an equal opportunity employer, welcoming all and providing access to opportunity. We actively foster an environment where the unique backgrounds of our associates are valued and celebrated. Our greatest strength lies in the rich blend of culture, talent, and experiences of our associates.  We are committed to non-discrimination on any protected basis, including disability, veteran status, or other basis protected by applicable law. All positions offer a 401(k) plan, stock purchase plan, discounts at Marriott properties, commuter benefits, employee assistance plan, and childcare discounts. Benefits are subject to terms and conditions, which may include rules regarding eligibility, enrollment, waiting period, contribution, benefit limits, election changes, benefit exclusions, and others. Click here [ to learn more. Full-time positions also offer coverage for medical, dental, vision, health care flexible spending account, dependent care flexible spending account, life insurance, disability insurance, accident insurance, adoption expense reimbursements, paid parental leave and educational assistance. Washington Applicants Only: Employees will accrue paid sick leave, 0.077 PTO balance for every hour worked and be eligible to receive a minimum of 9 holidays annually. Marriott HQ is committed to a hybrid work environment that enables associates to Be connected. Headquarters-based positions are considered hybrid, for candidates within a commuting distance to Bethesda, MD; candidates outside of commuting distance to Bethesda, MD will be considered for Remote positions. Marriott International is the world’s largest hotel company, with more brands, more hotels and more opportunities for associates to grow and succeed. Be where you can do your best work, begin your purpose, belong to an amazing global team, and become the best version of you.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Systems Engineer - Site Reliability Engineering in Bethesda, MD vacancy
  • Geico is seeking a Staff Engineer to innovate and enhance systems while mentoring engineers and collaborating across teams. This position involves utilizing programming languages like Go and Python, working with Azure services, Docker, and Kubernetes, and requires 6+ years... 
    Suggested

    Geico

    Bethesda, MD
    1 day ago
  •  ...Description Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the...  ...performant. This role is a hybrid of software engineering and systems architecture, with a specialized focus on MLOps —bridging... 
    Suggested
    Local area

    Tiger Analytics Inc.

    Washington DC
    20 days ago
  •  ...with cyber teams. Annually, or as required, revalidates with users and engineering that technology continues to meet requirements. Implement CM discipline for the entire life cycle of systems from initial requirements/capabilities baselines to system end-of-life... 
    Suggested

    Powder River Industries, LLC

    Washington DC
    9 days ago
  •  ...Description Job Description Description: Onsite in Washington, DC our client seeks a Sr. Site Reliability Engineer III to design, automate, and operate mission-critical systems for federal environments. The role focuses on Kubernetes or VMWare platforms, CI/CD... 
    Suggested
    Hourly pay
    Permanent employment
    Full time
    Local area
    Immediate start

    Eliassen Group

    Washington DC
    9 days ago
  • $125k - $200k

    Overview As a Site Reliability Engineer (SRE) , you will help design, build, and operate reliable, secure, and observable cloud‑native systems that support mission‑critical applications and services. You will blend software engineering, DevOps practices, and infrastructure... 
    Suggested
    Local area
    2 days per week

    Steampunk

    Mc Lean, VA
    5 days ago
  • A leading technology company is seeking a Senior Site Reliability Engineer in Virginia. The role involves maintaining a Kubernetes-based platform, ensuring high availability, and automating infrastructure processes with tools like Terraform. The ideal candidate will have... 
    Remote job
    Flexible hours

    Workday, Inc.

    Mc Lean, VA
    2 days ago
  • Senior Site Reliability Engineer Job Description Overview CoStar Group (NASDAQ: CSGP) is a leading global provider of commercial and residential...  ...-time data, millions of active users, and mission-critical systems across a globally distributed platform. As we scale, we're... 
    Full time
    Work at office
    Work from home
    Monday to Thursday

    Visual Lease

    Arlington, VA
    2 days ago
  • $166k - $220k

    ABOUT THE JOB As a site reliability engineer in Platform Discovery, you will solve a wide variety of problems involving networking, autonomy, systems integration, robotics, and more, while making pragmatic engineering tradeoffs along the way. Your efforts will ensure that... 
    Full time
    Work experience placement
    Relocation package

    Slope

    Washington DC
    2 days ago
  •  ...This role requires regularly working on-site at customer locations in Arlington, VA....  .... About The Role We are hiring a Site Reliability Engineer to join our Infrastructure & Security...  ...continuously improved, and you aim to leave systems easier to operate than you found them.... 
    Relocation
    Relocation package

    Onebrief, Inc.

    Arlington, VA
    3 days ago
  • $55.2k - $126k

     ...to expect during your journey as a candidate with us. Engineering to make a system more resilient and efficient frees up time and money to...  ...a passion for making systems better, we need you! As a site reliability engineer on our team, you’ll help our Platform Engineering... 
    Full time
    Contract work
    Part time
    Local area
    Remote work

    Phase2 Technology

    Mc Lean, VA
    3 days ago
  • $53k - $108k

     ...a balanced, fulfilling life. YOUR CANDIDATE JOURNEY Discover what to expect during your journey as a candidate with us. Site Reliability Engineer The Opportunity: Everyone is trying to “harness the cloud,” but not everyone knows how. As a DevOps engineer, you’re eager... 
    Full time
    Contract work
    Part time
    Local area
    Remote work

    Booz Allen Hamilton

    Mc Lean, VA
    3 days ago
  • $147.4k - $221.2k

    Senior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerremote type: Flexlocations: USA, VA, McLean: USA.VA.Restontime type: Full Timeposted on: Posted Yesterdayjob requisition id: JR-0104084**Your work days are brighter here.**We’re obsessed with... 
    Work experience placement
    Work at office
    Remote work
    Home office
    Flexible hours

    Workday, Inc.

    Mc Lean, VA
    2 days ago
  • A technology solutions provider seeks a System Developer based in Washington, DC, to support operations for the Small Business Administration...  ...Entra services. A minimum of 5 years of experience in systems engineering is required along with a Bachelor's degree in Computer Science... 
    Local area

    Highlighttech

    Washington DC
    4 days ago
  • A leading consulting firm located in McLean, Virginia, is looking for a Site Reliability Engineer to enhance their platform's reliability. You will focus on building resilient systems, implementing monitoring tools, and automating tasks. The ideal candidate has experience... 
    Remote job

    Booz Allen Hamilton

    Mc Lean, VA
    2 days ago
  • $55.2k - $126k

    A leading consulting firm in McLean, Virginia, is seeking a Site Reliability Engineer to enhance system resilience and efficiency. Key duties include developing robust infrastructure, implementing automation, and reducing manual tasks. The role requires experience with... 
    Remote job

    Booz Allen Hamilton

    Mc Lean, VA
    2 days ago
  • $100.2k - $203.4k

     ...moves missions and the government forward! The work As a Site Reliability Engineer, you will play a pivotal role in advancing operational AI adoption...  ..., scalability, and continuous monitoring of enterprise AI systems that support mission-critical applications and enterprise... 
    Full time
    Live in
    Work at office
    Local area

    Accenture Federal Services

    Arlington, VA
    20 hours ago
  • Salesforce is seeking a Site Reliability Engineer in Washington, DC to ensure cloud services availability. This role involves monitoring services...  ...incident management, and driving automation for resilient systems. Candidates should have a Bachelor's in Computer Science or... 

    Salesforce

    Washington DC
    3 days ago
  • $128.04k

     ...Skills: Artificial Intelligence (AI), DevSecOps, Kubernetes, Reliability Analysis Certifications: None Experience: 8 + years of...  ...challenges. The CDAO Advana team is seeking an Site Reliability Engineering Lead - Model Serving, to join their efforts in the DC area... 
    Full time
    Work at office
    Immediate start
    Remote work
    Worldwide
    Flexible hours

    General Dynamics Information Technology

    Washington DC
    2 days ago
  • Job Category Software Engineering Overview of the Role Join our Site Reliability Engineering (SRE) team, where you'll work alongside Infrastructure and Research...  ..., drive automation, and help build the resilient systems that millions of customers depend on every day. This... 
    Work experience placement

    salesforce.com, inc.

    Washington DC
    2 days ago
  • $3,000 per month

     ...DOING Lockheed Martin, Rotary Mission Systems Cyber & Intelligence invites you to step...  ...standards, confer with users or system engineers; analyze systems flow, data usage and work...  ...to match the caliber of your work. Reliable, high-performing, and mission-ready. You... 

    Lockheed Martin

    Glen Echo, MD
    3 hours ago
  •  ...Technology Platform (DTP) contract. You will work closely with Systems Engineers, Software Engineers, Architects, and Operations Engineering/...  ...professional development. While most work is conducted on-site at our client location in Bethesda, MD, we offer a flexible schedule... 
    Contract work
    Remote work
    Flexible hours

    Xcelerate Solutions

    Bethesda, MD
    3 days ago
  • $87.1k - $157.45k

    Release Train Engineer The Decision Advantage division at Leidos currently has an opening for a Release Train Engineer. This is an exciting...  ...Train (ART) events, including PI Planning, Scrum of Scrums, & System Demos Coordinate and synchronize multiple Agile teams to... 
    For contractors

    Leidos

    Bethesda, MD
    5 days ago
  • Relha LLC is seeking a Site Reliability Engineer to join their team in Washington, DC. The role involves monitoring customer-facing services, managing incidents, and automating production issue resolutions. Candidates should possess a Bachelor's degree in Computer Science... 

    Relha LLC

    Washington DC
    1 day ago
  • Salesforce.com, inc. is looking for a Site Reliability Engineer in Washington, DC. In this role, you will monitor customer-facing services, respond to critical incidents, and drive automation to enhance service resiliency. Required qualifications include a Bachelor's degree... 

    salesforce.com, inc.

    Washington DC
    3 days ago
  • Salesforce, Inc. is looking for a Site Reliability Engineer to join their team in Washington, D.C. This role involves monitoring and responding to urgent incidents to ensure cloud services remain operational. You will also automate recurring issues, contribute to self-healing... 

    Salesforce, Inc..

    Washington DC
    2 days ago
  •  ...IBM z15 and z16 Mainframe Support Supporting IBM z15 and z16 mainframe and z/OS 2.5 or higher operating system. As part of IBM zCloud resource pool, supporting the zCloud Offering which consists of the following clients: Federal Retirement Investment Board (FRTIB),... 
    Work at office

    Samprasoft

    Bethesda, MD
    2 days ago
  • A bioscience and IT firm located in Rockville, Maryland is seeking a DevOps Engineer with extensive experience in Linux and cloud platforms. The successful candidate will be responsible for designing scalable infrastructure, leading DevOps practices, and optimizing AI/... 

    Axle

    Rockville, MD
    4 days ago
  • A leading defense contractor in Bethesda, MD is looking for a Senior Software Developer to create advanced sonar tactical decision aids, supporting the U.S. Navy's operational readiness. Candidates should have at least 3 years of experience in Java and/or C++ development...
    For contractors

    Leidos

    Bethesda, MD
    5 days ago
  • Sr. Software Engineer Responsibilities: Gather requirements and develop technical, functional and solution documents. Architect and implement...  ...‑ups, and address issues to keep projects on track. Document system and domain knowledge to eliminate single points of failure.... 
    Relocation

    Compubahn, Inc.

    Bethesda, MD
    2 days ago
  • A leading insurance company is seeking a Senior Engineer to drive innovation in building high-performance, low-maintenance platforms. You will lead technical projects, improve existing systems, and collaborate with teams to enhance engineering capabilities. The role requires... 

    GEICO

    Chevy Chase, MD
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Systems Engineer - Site Reliability Engineering. Be the first to apply!