Senior Site Reliability Engineer
Oracle
Job Description
We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection and resolution of issues.
The engineer will work closely with development, infrastructure, security, and operations teams to monitor service health, troubleshoot production issues, participate in incident response, improve observability, and implement reliability best practices. This role also includes analyzing recurring failures, building automation, supporting deployments, and contributing to capacity planning, disaster recovery, and operational readiness.
Also works on number of different region/realm rollouts, deployments. Forecasts demands and responds to capacity needs. Collaborates with software development teams to develop reliable and scalable infrastructures. Performs data collection to maintain and optimize operations and reliability. Leverages knowledge to perform incident response and/or maintenance tasks. Provides health and performance reporting. Identifies opportunities for automation. Communicates about services and identifies and explains the potential impact of changes. Provides support for technology and document incidents. Experiments with new tools and assesses potential impact and develops knowledge of site reliability trends.
Responsibilities
Key Responsibilities
Capacity Ingestion and Management:
-Takes proactive steps to design and architect infrastructure and/or service according to terms for reliability and functionality.
-Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads.
-Collaborates with the software development team to develop infrastructures and features that are reliable and scalable according to deployment requirements.
-Independently identifies opportunities for and drives prototyping (e.g., testing new applications or infrastructures, assisting in onboarding).
Incident and Service Lifecycle Management:
-Performs data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
-Independently monitors services, maintains up-to-date knowledge of their performance, and documents their condition.
-Leverages comprehensive knowledge to perform incident response, root cause analyses, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery).
-Provides health and performance reporting and takes appropriate actions based on trends in data.
-May independently perform provisioning to support infrastructure, applications, and services.
-May perform standard and non-standard decommissioning (e.g., shutting down servers, removing data from databases) to remove objects that are no longer needed.
Automation:
-Identifies opportunities for automation and assesses potential benefits.
-Develops automation tools or scripts to provide solutions, gather metrics, monitor, analyze, mitigate, or remediate issues/defects within infrastructures.
-Independently conducts testing to ensure automation performs the task correctly and produces expected results.
Technical Communication and Guidance:
-Communicates the scale, capacity, security, performance attributes, and requirements of services and technology within and sometimes beyond immediate team.
-Identifies and explains the potential impact of infrastructure, feature, and tool changes, considering their impact on team operations.
Troubleshooting and Resolution:
-Provides operational support for technology, escalating incidents and other standard and non-standard issues arising within Oracle services.
-Participates in on-call shifts to address issues.
-Resolves technical issues spanning various services, investigating and debugging products in order to reach SLOs (service level objectives).
-Documents incidents and performs root cause analyses according to standard reporting methods.
-Independently performs post-mortem procedures to prevent incident reoccurrence.
Innovation and Improvement:
-Experiments with new tools and technologies to assess their potential impact on and improve infrastructure performance and reliability, ensuring adherence to security standards.
-Independently identifies and executes improvements for performance bottlenecks and deployments to ensure efficient resource usage, speed, and scalability.
-Develops knowledge of site reliability trends and shares new information with team members, management, and beyond to help others build, test, deploy and run services.
-Performs standard and non-standard analyses and provides clear data on production to contribute to business development decisions (e.g., design changes).
Core Responsibilities
Planning & Execution:
Independently manages work, monitoring timelines and deliverables to ensure projects or initiatives stay on track and meet requirements. Proactively prioritizes work and adapts to resource or timeline shifts, suggesting adjustments to maintain project efficiency.
Collaboration & Partnership:
Collaborates across teams to align on expectations and achieve shared objectives. Builds and maintains a comprehensive understanding of business, stakeholder, and/or customer needs to build and support effective partnerships. Actively listens to diverse perspectives and asks questions to ensure understanding of others.
Problem Solving:
Independently identifies and addresses standard and non-standard issues in accordance with standard practices, escalating more complex issues as appropriate. Analyzes data and/or information from multiple sources to troubleshoot standard and non-standard errors. Contributes to knowledge sharing and best practices.
Continuous Learning:
Embraces continuous learning by actively seeking to build knowledge and new skills and/or tools and staying current with industry trends and best practices. Seeks out and leverages feedback and training to improve skills. Contributes to a culture of continuous learning and knowledge sharing with team members.
Continuous Improvement:
Develops ideas and recommends updates to increase the efficiency and effectiveness of processes, protocols, and workflows within a team. Seeks input from team members on alternative approaches and methods for improving work.
IAC: Terraform, Chef, Ansible
Languages: Python, Java, Bash
Orchestration: Kubernetes, Helm
CI/CD: Jenkins
Observability: Grafana, Prometheus
Minimum Job Qualifications
Education and/or Experience:
8 years of experience in software engineering, infrastructure management, or related field
OR
Bachelor's Degree in Computer Science, Engineering, or related field AND 4 years of experience in software engineering, infrastructure management, or related field
OR
Master's Degree in Computer Science, Engineering, or related field AND 2 year of experience in software engineering, infrastructure management, or related field.
OR
Doctorate in Computer Science, Engineering, or related field
Job Skills:
Same skills as prior level plus;
Operating Systems Demonstrated ability in or knowledge of operating systems, including installing, upgrading, and troubleshooting various operating environments.
Automation Experience:
3 years of experience in automation.
Programming Experience:
3 years of experience in programming and/or scripting.
Preferred Job Qualifications
Education and/or Experience:
9 years of experience in software engineering, infrastructure management, or related field
OR
Bachelor's Degree in Computer Science, Engineering, or related field AND 5 years of experience in software engineering, infrastructure management, or related field
OR
Master's Degree in Computer Science, Engineering, or related field AND 3 years of experience in software engineering, infrastructure management, or related field
OR
Doctorate in Computer Science, Engineering, or related field AND 1 year of experience in software engineering, infrastructure management, or related field.
Automation Experience:
5 years of experience in automation.
Programming Experience:
5 years of experience in programming and/or scripting.
About Us
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on click.appcast.io or by calling View phone number on click.appcast.io in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
$99.6k - $234.6k
...Job Description As a Principal Site Reliability Engineer, you will play a pivotal role in building and operating the Oracle HealthPatient Portal. In this role, you will design, build, and operate highly reliable, scalable infrastructure that supports Commercial and...SuggestedTemporary workFlexible hours- ...About the job Site Reliability Engineer ***W2 only*** Position: Site Reliability Engineer (SRE) Work Authorization: All Work Authorizations Location: Richmond, VA Contract: 24 months As one of the Site Reliability Engineers, youll be able to work...SuggestedContract workImmediate start
- ...Senior Site Reliability Engineering The Sr Cloud Engineer/Sr Site Reliability Engineer is a member of Cloud Operations Automation team and responsible for the reliability, security and efficiency of our environments and products that comprise Enterprise Imaging solutions...SuggestedRotating shift
- Job Title Top skills: ~ Kubernetes ~ CKA certification is must ~ New relic and Splunk - document the issues and provide resolution, troubleshooting and creating new alerts ~3+ yrs of experience as SRE with KubernetesSuggested
- ...Lead Site Reliability Engineer McLean, VA or Richmond, VA - onsite position - will consider relocations( But try to find Local) Must have senior level experience with the following: Strong reliability and operational mindset (not frontend development...SuggestedLocal areaRelocation
- ...grow, make an impact, and work with people who care, we'd love to meet you! ABOUT THE ROLE We are looking for a Site Reliability Engineer to ensure the reliability, security, and continuous operation of a multi-cloud application security platform. This role combines...Work at officeRemote workVisa sponsorshipWork visaFlexible hours
- ...Senior Software Engineer Location: Richmond, VA For this role we are looking for a senior java developer with extensive hands-on dev experience Responsibilities Analyze the requirements that are given from user stories, understand application changes. Writing...Senior
$143k - $243k
Prime Therapeutics is seeking a Senior Principal Actuary to provide actuarial direction and strategic consulting. This remote position will innovate pricing strategies and lead actuarial staff. The ideal candidate will have 10 years of actuarial experience, a relevant...SeniorRemote work- ...Senior Platform Engineer Do you love building and pioneering in the technology space? Do you enjoy solving complex technical problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers...Senior
- ...Senior Python Engineer Location: Richmond, VA/Hybrid Rate: $54/Hr on W2 Direct Client: Capital One Description: Tech stack is Python They will be developing new AWS components for a product launching in August. They...Senior
- ## Senior Platform EngineerApplylocations: Richmondtime type: Full timeposted on: Posted... ...id: R014306**Senior Cloud Platform Engineer**MECCA has a large and growing focus in... ...Config, AWS Security Hub)* Observability & Reliability + Experience configuring and tuning observability...SeniorImmediate start
- ...motivated candidate to join our talented Team. Job Title: Senior Software Systems Engineer Job Location: Richmond, VA Required Proven... ...and delivery platforms. This role is focused on creating reliable, observable, and high performance data pipelines that support...Senior
$131.3k - $149.8k
...Senior Platform Engineer Do you love building and pioneering in the technology space? Do you enjoy solving complex technical problems in a fast-... ...educational tools or other information available through this site. Capital One Financial is made up of several different...SeniorFull timePart timeLocal area$94.9k - $135.6k
A prominent healthcare provider is seeking a candidate for a role in Application Development & Maintenance in Richmond, Virginia. Responsibilities include developing solutions in the Medical Transportation area and collaborating with other IT teams. The ideal candidate ...SeniorRemote work$150k - $185k
...A managed services provider is seeking a Sr. Managed Services Engineer focused on AI & Copilot solutions. This remote role involves designing, managing, and optimizing platforms for customers. Key responsibilities include collaborating with teams, leading service improvement...SeniorRemote work$125k - $191.7k
...Job Description Hybrid: This role is categorized as hybrid/Remote Role: As a Senior Software Systems Engineer on the Software Validation team within the AV organization, you will play a critical role in leading the strategy and execution of validation efforts...SeniorLocal areaRemote workWork from homeFlexible hours$286.2k - $326.7k
Capital One is seeking a Senior Distinguished Engineer, AI Compute based in Richmond, VA. This remote-eligible role involves architecting and scaling foundational capabilities for an enterprise AI and ML platform. Candidates should have a bachelor's degree, 7+ years of...SeniorRemote work- ## Senior Software Engineer (Hybrid Schedule)Applyremote type: Hybridlocations: Virginia Retirement System - Main Officetime type: Full timeposted on: Posted 15 Days Agojob requisition id: JR9The Virginia Retirement System, an independent state agency based in Richmond...SeniorWork at officeRemote workMonday to Friday
$186.07k - $218.9k
...for owning the design, development, and reliability of core platform services that underpin... ...organization management) Championing engineering standards, code and design review culture... ...reading technology compatible with this site click here to download a free compatible...SeniorLocal area$150k
...Job Description Senior Software Engineer Richmond, VA $150,000 I am hiring for a Senior Full Stack Software Engineer to join... ..., accessibility, and user experience. Ensure platform reliability through testing, automation, and cloud-based best practices...SeniorWork at officeRemote work$286.2k - $326.7k
A leading financial institution is looking for a Sr. Distinguished Machine Learning Engineer to define the technical strategy for their Personalization Platform. You will work cross-functionally to develop advanced recommendation systems while maintaining robust ML infrastructure...SeniorRemote work- ...Title: Senior Software Engineer (Java EE ) Duration : 6 months temp to hire Description: · This position analyzes problems in terms of detailed requirements. · Designs detailed flow charts. Verifies program logic by preparing tests data for trial runs. Tests and debugs...SeniorTemporary work
- ...Senior Java Developer Location: Richmond, VA (Hybrid role) Duration: 12+ months Contract. Must Haves: Java AWS Spring boot Rest CI/CD Job Description: Analyze internal user needs and desired results and develop software solutions...SeniorContract work
- A financial services company is looking for a Sr Lead Software Engineer, Analytics to lead a diverse portfolio of technology projects with a team of developers. Responsibilities include collaborating to deliver robust cloud-based solutions that empower millions financially...SeniorRemote work
- ...JD: Backend Software Engineer Java, Springboot, AWS, CI/CD, SQL databases Modernizing applications on IAM documents and messaging Must sit in McLean, VA Hybrid (Mon-Fri remote/Tue-Thu onsite) 1 hour zoom video interview - coding assessment will...SeniorRemote work
- ...Senior Software Engineer Our client, in the specialty transportation industry, is seeking a Senior Software Engineer to join their team, hit the ground running, and help drive to success. This is a permanent/direct hire position. The employee will work in their Downtown...SeniorPermanent employmentCasual workWork at officeFlexible hours
- ...Senior SaaS Platform Engineer CapTech is an award-winning consulting firm that collaborates with clients to achieve what's possible through the power of technology. At CapTech, we're passionate about the work we do and the results we achieve for our clients. From the...SeniorVisa sponsorshipWork visa
- ...Business consulting services. We are in search of a highly motivated candidate to join our talented Team. Job Title: Senior Software Engineer, Full Stack. Location: Richmond, VA. Key Responsibilities: Collaborate with and across Agile teams to design...SeniorInternship
$197.4k - $232k
...Location Type: Remote Department Engineering Compensation: $197.4K – $232K •... ...Streaming Platform. About the Role Senior Software Engineers II at Confluent take... ...and technical decisions that balance reliability, scalability, performance, and operability...SeniorFull timeRemote work- ...A leading company in IT solutions is seeking a Senior .NET Developer in Richmond, Virginia. The successful candidate will have 10+ years of experience in a .NET environment with a focus on enterprise integration and cloud technologies. This role requires strong design...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!
- senior data management analyst Richmond, VA
- senior app developer Richmond, VA
- senior game producer Richmond, VA
- senior manager quality engineering Richmond, VA
- senior quantitative risk analyst Richmond, VA
- senior compensation manager Richmond, VA
- senior sourcing engineer Richmond, VA
- senior director engineering Richmond, VA
- senior accounts receivable Richmond, VA
- senior vice president of operations Richmond, VA

