Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer

$81.1k - $187k

Oracle

Job Description

We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection and resolution of issues.

The engineer will work closely with development, infrastructure, security, and operations teams to monitor service health, troubleshoot production issues, participate in incident response, improve observability, and implement reliability best practices. This role also includes analyzing recurring failures, building automation, supporting deployments, and contributing to capacity planning, disaster recovery, and operational readiness.

Also works on number of different region/realm rollouts, deployments. Forecasts demands and responds to capacity needs. Collaborates with software development teams to develop reliable and scalable infrastructures. Performs data collection to maintain and optimize operations and reliability. Leverages knowledge to perform incident response and/or maintenance tasks. Provides health and performance reporting. Identifies opportunities for automation. Communicates about services and identifies and explains the potential impact of changes. Provides support for technology and document incidents. Experiments with new tools and assesses potential impact and develops knowledge of site reliability trends.

Responsibilities

Key Responsibilities

Capacity Ingestion and Management:

-Takes proactive steps to design and architect infrastructure and/or service according to terms for reliability and functionality.

-Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads.

-Collaborates with the software development team to develop infrastructures and features that are reliable and scalable according to deployment requirements.

-Independently identifies opportunities for and drives prototyping (e.g., testing new applications or infrastructures, assisting in onboarding).

Incident and Service Lifecycle Management:

-Performs data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.

-Independently monitors services, maintains up-to-date knowledge of their performance, and documents their condition.

-Leverages comprehensive knowledge to perform incident response, root cause analyses, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery).

-Provides health and performance reporting and takes appropriate actions based on trends in data.

-May independently perform provisioning to support infrastructure, applications, and services.

-May perform standard and non-standard decommissioning (e.g., shutting down servers, removing data from databases) to remove objects that are no longer needed.

Automation:

-Identifies opportunities for automation and assesses potential benefits.

-Develops automation tools or scripts to provide solutions, gather metrics, monitor, analyze, mitigate, or remediate issues/defects within infrastructures.

-Independently conducts testing to ensure automation performs the task correctly and produces expected results.

Technical Communication and Guidance:

-Communicates the scale, capacity, security, performance attributes, and requirements of services and technology within and sometimes beyond immediate team.

-Identifies and explains the potential impact of infrastructure, feature, and tool changes, considering their impact on team operations.

Troubleshooting and Resolution:

-Provides operational support for technology, escalating incidents and other standard and non-standard issues arising within Oracle services.

-Participates in on-call shifts to address issues.

-Resolves technical issues spanning various services, investigating and debugging products in order to reach SLOs (service level objectives).

-Documents incidents and performs root cause analyses according to standard reporting methods.

-Independently performs post-mortem procedures to prevent incident reoccurrence.

Innovation and Improvement:

-Experiments with new tools and technologies to assess their potential impact on and improve infrastructure performance and reliability, ensuring adherence to security standards.

-Independently identifies and executes improvements for performance bottlenecks and deployments to ensure efficient resource usage, speed, and scalability.

-Develops knowledge of site reliability trends and shares new information with team members, management, and beyond to help others build, test, deploy and run services.

-Performs standard and non-standard analyses and provides clear data on production to contribute to business development decisions (e.g., design changes).

Core Responsibilities

Planning & Execution:

Independently manages work, monitoring timelines and deliverables to ensure projects or initiatives stay on track and meet requirements. Proactively prioritizes work and adapts to resource or timeline shifts, suggesting adjustments to maintain project efficiency.

Collaboration & Partnership:

Collaborates across teams to align on expectations and achieve shared objectives. Builds and maintains a comprehensive understanding of business, stakeholder, and/or customer needs to build and support effective partnerships. Actively listens to diverse perspectives and asks questions to ensure understanding of others.

Problem Solving:

Independently identifies and addresses standard and non-standard issues in accordance with standard practices, escalating more complex issues as appropriate. Analyzes data and/or information from multiple sources to troubleshoot standard and non-standard errors. Contributes to knowledge sharing and best practices.

Continuous Learning:

Embraces continuous learning by actively seeking to build knowledge and new skills and/or tools and staying current with industry trends and best practices. Seeks out and leverages feedback and training to improve skills. Contributes to a culture of continuous learning and knowledge sharing with team members.

Continuous Improvement:

Develops ideas and recommends updates to increase the efficiency and effectiveness of processes, protocols, and workflows within a team. Seeks input from team members on alternative approaches and methods for improving work.

IAC: Terraform, Chef, Ansible

Languages: Python, Java, Bash

Orchestration: Kubernetes, Helm

CI/CD: Jenkins

Observability: Grafana, Prometheus

Disclaimer:

Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $81,100 to $187,000 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

  1. Medical, dental, and vision insurance, including expert medical opinion

  2. Short term disability and long term disability

  3. Life insurance and AD&D

  4. Supplemental life insurance (Employee/Spouse/Child)

  5. Health care and dependent care Flexible Spending Accounts

  6. Pre-tax commuter and parking benefits

  7. 401(k) Savings and Investment Plan with company match

  8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

  9. 11 paid holidays

  10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

  11. Paid parental leave

  12. Adoption assistance

  13. Employee Stock Purchase Plan

  14. Financial planning and group legal

  15. Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - IC3

About Us

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on click.appcast.io or by calling View phone number on click.appcast.io in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer in Washington DC vacancy
  •  ...Senior Site Reliability Engineer United States About OfficeSpace: OfficeSpace Software provides the leading AI operating system for the built world, that helps teams plan, connect, and perform in the workplace. As a performance-based, PE-backed company, we hire... 
    Senior
    Shift work

    OfficeSpace Software

    Washington DC
    4 days ago
  • $104.9k - $174.7k

     ...immediately hire a highly skilled and proactive Senior SRE to join our dynamic team. You will...  ...fault-tolerant systems within agreed reliability objectives, whilst enabling the fast...  ...skills. About team; This diverse team of Engineers in assisting multiple product teams as... 
    Senior
    Local area
    Immediate start
    Worldwide

    RELX

    Alexandria, VA
    1 day ago
  • $121.4k - $218.6k

     ...will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings. **Partner...  ...indicators and defend them when they are breached. As a Senior Site Reliability Engineer, you will be responsible for: + Developing and scaling robust... 
    Senior
    Work experience placement
    Work at office

    Akamai

    Washington DC
    1 day ago
  • $106.3k - $221.1k

     ...more. Join us to drive positive, lasting change that moves missions and the government forward! Job Description The Site Reliability Engineer will ensure the reliability, performance, and scalability of the Client System. The engineer will define and track Key... 
    Senior
    Live in
    Work at office
    Local area

    Accenture

    Arlington, VA
    3 days ago
  •  ..., and Onsite Notice: This role requires regularly working on-site at customer locations in Arlington, VA. If you are not currently...  ...obtain SCI eligibility. About The Role We are hiring a Site Reliability Engineer to join our Infrastructure & Security team. You’ll work... 
    Senior
    Relocation
    Relocation package

    Onebrief, Inc.

    Arlington, VA
    4 days ago
  • $191k - $287k

     ...requirements and customer expectations. Our systems integration engineers internalize the nuances of each deployment, ensuring the...  ...‑end solutions we ship. About the Job We are looking for a Site Reliability Engineer (SRE) to join AGD, our rapidly growing team in Costa... 
    Senior

    Slope

    Washington DC
    5 days ago
  •  ...Sr. Site Reliability Engineer (SRE) III As a Sr. Site Reliability Engineer (SRE) III, you'll work as part of a collaborative and high-performing team providing your expertise to deliver technical solutions within the highest levels of the federal government. We believe... 
    Senior
    Immediate start

    Mount Indie

    Washington DC
    5 days ago
  • $175k - $195k

     ...Filevine Sr. Observability Engineer Filevine is a Legal AI company delivering Legal Operating...  .... # Define and manage SLIs, SLOs, and reliability metrics. # Lead incident response,...  ..., or operations. #5+ years of Site Reliability Engineering experience. #... 
    Senior
    Full time
    Temporary work

    Filevine

    Washington DC
    2 days ago
  • $135k - $150k

    Senior Site Reliability Engineer Job number: 884 This is a remote position. Ad Hoc is a technology company that empowers organizations to deliver scalable, impactful digital services. Using modern, agile methods, our team creates products that meet people's... 
    Senior
    Remote work
    Flexible hours

    Ad Hoc LLC

    Silver Spring, MD
    8 hours ago
  • $165k - $230k

     ...actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars. SR. SITE RELIABILITY ENGINEER (STARSHIELD) Starshield leverages SpaceX’s Starlink technology and launch capability to support national security efforts... 
    Senior
    Permanent employment
    Temporary work
    Immediate start
    Weekend work

    SpaceX

    Washington DC
    3 days ago
  • A leading technology company is seeking a Senior Site Reliability Engineer in Virginia. The role involves maintaining a Kubernetes-based platform, ensuring high availability, and automating infrastructure processes with tools like Terraform. The ideal candidate will have... 
    Senior
    Remote job
    Flexible hours

    Workday, Inc.

    Mc Lean, VA
    3 days ago
  • Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the guardian of our production ecosystems, ensuring that our complex, data-driven AI platforms remain resilient, scalable, and highly performant... 
    Senior
    Local area

    Tiger Analytics, LLC

    Washington DC
    3 days ago
  • $147.4k - $221.2k

    Senior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerremote type: Flexlocations: USA, VA, McLean: USA.VA.Restontime type: Full Timeposted on: Posted Yesterdayjob requisition id: JR-0104084**Your work days are brighter here.**We’re obsessed... 
    Senior
    Work experience placement
    Work at office
    Remote work
    Home office
    Flexible hours

    Workday, Inc.

    Mc Lean, VA
    3 days ago
  • $165k - $230k

    Sr. Site Reliability Engineer (Starshield) Washington, DC SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make... 
    Senior
    Permanent employment
    Temporary work
    Immediate start
    Weekend work

    SPACE EXPLORATION TECHNOLOGIES CORP

    Washington DC
    4 days ago
  • $84.9k - $209.5k

     ...architects infrastructure and service to ensure reliability and functionality. Forecasts demands and...  ...and maintains advanced knowledge of site reliability trends. #LI-E2...  ...valuable insights and information with senior team members, management, and beyond to... 
    Temporary work
    Immediate start
    Flexible hours
    Shift work

    Oracle

    Washington DC
    3 days ago
  • $147k - $202k

     ...TechOps) team, we live this mission by building the most reliable and performant systems on the planet. We empower...  ...need. The Role We are looking for an experienced Senior Site Reliability Engineer (SRE) who thrives on the challenge of managing large-scale... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta

    Washington DC
    16 days ago
  • $86.8k - $198k

     ...Job Number: R0243370 Site Reliability Engineer The Opportunity: At Booz Allen, our Global Defense Sector (GDS) supports the Department of War (DoW) in delivering resilient, mission-critical digital capabilities. We are seeking a Site Reliability Engineer to help... 
    Full time
    Contract work
    Part time
    Work at office
    Local area
    Remote work

    Booz Allen Hamilton

    Arlington, VA
    2 days ago
  • $160k - $180k

     ...Site Reliability Engineer Location: Hybrid – Washington DC/Virginia/Maryland metro with the ability to travel to Patuxent River, MD, as needed (up to 20% of the time). Compensation: $160,000 - 180,000 per year, depending on experience and qualifications. Employment... 
    Full time
    Temporary work
    Local area
    Remote work
    Flexible hours

    Fortress Information Security

    Washington DC
    4 days ago
  • $131k - $227.13k

     ...Description: The 1LMX MES COE is seeking an engineer who will own infrastructure‑as‑code, cloud platform, and reliability for the Apriso environment on AWS. This role blends full‑stack development, DevOps, and Site Reliability Engineering (SRE) practices to deliver a... 
    Full time
    Temporary work
    Work experience placement
    Work at office
    Remote work
    Relocation
    Flexible hours
    Shift work
    3 days per week

    Lockheed Martin Corporation

    Bethesda, MD
    2 days ago
  • $112k - $179k

     ...system, network, software, and security solutions. About The Role Peraton is seeking a self-driven and resourceful Site Reliability Engineer to join our dynamic of Network and UC engineers in Washington, DC. This position combines software engineering and systems... 
    Contract work
    Worldwide
    Shift work

    Peraton

    Washington DC
    2 days ago
  •  ...Site Reliability Engineer Qualifications: 10+ years of overall experience in IT including, with hands-on Development and Systems engineering background 3-5 years of experience in a Site Reliability Engineering role Experience with Enterprise Cloud transformation... 
    Temporary work
    Immediate start

    Samprasoft

    Washington DC
    2 days ago
  • $135k - $150k

     ...Mission Focused Expertise: From veteran leadership to cleared engineers, our people understand both the technology and the mission. Summary Bridge Defense seeks a highly qualified Site Reliability Engineer to build and lead the company's deployment engineering... 
    Relocation
    Flexible hours

    Bridge Defense

    Washington DC
    4 days ago
  • $95k - $171k

     .... Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads within Akamai's serverless inference platform. As an Site Reliability Engineer II, you will be responsible for: Building and maintaining dashboards, alerts... 
    Permanent employment
    Work experience placement
    Work at office
    Remote work
    Work from home
    Worldwide
    Flexible hours

    Akamai

    Washington DC
    7 days ago
  • $3,000 per month

     ...analyzing system performance standards, confer with users or system engineers; analyze systems flow, data usage and work processes; and...  ...Our benefits are built to match the caliber of your work. Reliable, high-performing, and mission-ready. You’ll enjoy world‑class... 
    Senior

    Lockheed Martin

    Bladensburg, MD
    1 day ago
  • $207k - $284.9k

     ...This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk. Senior Manager, Site Reliability Engineering District of Columbia Area Secure Every Identity, from AI to Human Identity is the key to unlocking the potential... 
    Senior
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta

    Washington DC
    a month ago
  • $131k - $164k

     ...Staff Site Reliability Engineer New York, New York, United States Position Overview We are seeking a highly skilled Staff Site Reliability...  ...Infrastructure & Operations team. This role is a hands-on senior engineering position responsible for designing, maintaining... 
    Work at office
    Local area
    Flexible hours

    Diligent

    Washington DC
    4 days ago
  • $121.5k - $306.4k

     ...and provides input on best practices for reliability and functionality. Establishes direction...  ...technology, executing improvements, building site reliability knowledge, and providing...  ...Establishes direction for other managers and senior-level individuals to drive the... 
    Temporary work
    Flexible hours

    Oracle

    Washington DC
    5 days ago
  •  ...A leading technology services provider is seeking a Senior Mainframe Systems Programmer to provide premium DB2 systems support. The successful candidate will manage database upgrades, perform maintenance, and resolve client issues while participating in an on-call rotation... 
    Senior
    Remote work
    Flexible hours

    Ensono

    Washington DC
    2 days ago
  • $125k - $200k

     ...A leading technology company in Washington, DC is seeking a Senior Software Engineer to support the Naval Sea Systems Command. The ideal candidate will analyze user needs, develop software solutions, and integrate systems while being part of a contractor team. A Bachelor... 
    Senior
    For contractors

    Decision Technologies, Inc.

    Washington DC
    5 days ago
  • $82.55k - $149.23k

    Leidos is seeking an experienced Release Train Engineer in Alexandria, Virginia, to support enterprise data and analytics programs. The role involves leading Agile teams, facilitating program-level execution, and ensuring cohesive communication across stakeholders. Ideal... 
    Senior

    Leidos

    Alexandria, VA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!