Senior Site Reliability Engineer
$81.1k - $187kOracle
Job Description
We are looking for a Site Reliability Engineer 3 to support mission-critical cloud services and production operations. The role focuses on improving service reliability, reducing operational risk, automating repetitive tasks, and driving faster detection and resolution of issues.
The engineer will work closely with development, infrastructure, security, and operations teams to monitor service health, troubleshoot production issues, participate in incident response, improve observability, and implement reliability best practices. This role also includes analyzing recurring failures, building automation, supporting deployments, and contributing to capacity planning, disaster recovery, and operational readiness.
Also works on number of different region/realm rollouts, deployments. Forecasts demands and responds to capacity needs. Collaborates with software development teams to develop reliable and scalable infrastructures. Performs data collection to maintain and optimize operations and reliability. Leverages knowledge to perform incident response and/or maintenance tasks. Provides health and performance reporting. Identifies opportunities for automation. Communicates about services and identifies and explains the potential impact of changes. Provides support for technology and document incidents. Experiments with new tools and assesses potential impact and develops knowledge of site reliability trends.
Responsibilities
Key Responsibilities
Capacity Ingestion and Management:
-Takes proactive steps to design and architect infrastructure and/or service according to terms for reliability and functionality.
-Forecasts demands for infrastructure and responds to capacity needs, ensuring systems have sufficient resources to handle current and future workloads.
-Collaborates with the software development team to develop infrastructures and features that are reliable and scalable according to deployment requirements.
-Independently identifies opportunities for and drives prototyping (e.g., testing new applications or infrastructures, assisting in onboarding).
Incident and Service Lifecycle Management:
-Performs data collection, triage, technical analysis, and redirection to maintain and optimize operations and infrastructure reliability.
-Independently monitors services, maintains up-to-date knowledge of their performance, and documents their condition.
-Leverages comprehensive knowledge to perform incident response, root cause analyses, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery).
-Provides health and performance reporting and takes appropriate actions based on trends in data.
-May independently perform provisioning to support infrastructure, applications, and services.
-May perform standard and non-standard decommissioning (e.g., shutting down servers, removing data from databases) to remove objects that are no longer needed.
Automation:
-Identifies opportunities for automation and assesses potential benefits.
-Develops automation tools or scripts to provide solutions, gather metrics, monitor, analyze, mitigate, or remediate issues/defects within infrastructures.
-Independently conducts testing to ensure automation performs the task correctly and produces expected results.
Technical Communication and Guidance:
-Communicates the scale, capacity, security, performance attributes, and requirements of services and technology within and sometimes beyond immediate team.
-Identifies and explains the potential impact of infrastructure, feature, and tool changes, considering their impact on team operations.
Troubleshooting and Resolution:
-Provides operational support for technology, escalating incidents and other standard and non-standard issues arising within Oracle services.
-Participates in on-call shifts to address issues.
-Resolves technical issues spanning various services, investigating and debugging products in order to reach SLOs (service level objectives).
-Documents incidents and performs root cause analyses according to standard reporting methods.
-Independently performs post-mortem procedures to prevent incident reoccurrence.
Innovation and Improvement:
-Experiments with new tools and technologies to assess their potential impact on and improve infrastructure performance and reliability, ensuring adherence to security standards.
-Independently identifies and executes improvements for performance bottlenecks and deployments to ensure efficient resource usage, speed, and scalability.
-Develops knowledge of site reliability trends and shares new information with team members, management, and beyond to help others build, test, deploy and run services.
-Performs standard and non-standard analyses and provides clear data on production to contribute to business development decisions (e.g., design changes).
Core Responsibilities
Planning & Execution:
Independently manages work, monitoring timelines and deliverables to ensure projects or initiatives stay on track and meet requirements. Proactively prioritizes work and adapts to resource or timeline shifts, suggesting adjustments to maintain project efficiency.
Collaboration & Partnership:
Collaborates across teams to align on expectations and achieve shared objectives. Builds and maintains a comprehensive understanding of business, stakeholder, and/or customer needs to build and support effective partnerships. Actively listens to diverse perspectives and asks questions to ensure understanding of others.
Problem Solving:
Independently identifies and addresses standard and non-standard issues in accordance with standard practices, escalating more complex issues as appropriate. Analyzes data and/or information from multiple sources to troubleshoot standard and non-standard errors. Contributes to knowledge sharing and best practices.
Continuous Learning:
Embraces continuous learning by actively seeking to build knowledge and new skills and/or tools and staying current with industry trends and best practices. Seeks out and leverages feedback and training to improve skills. Contributes to a culture of continuous learning and knowledge sharing with team members.
Continuous Improvement:
Develops ideas and recommends updates to increase the efficiency and effectiveness of processes, protocols, and workflows within a team. Seeks input from team members on alternative approaches and methods for improving work.
IAC: Terraform, Chef, Ansible
Languages: Python, Java, Bash
Orchestration: Kubernetes, Helm
CI/CD: Jenkins
Observability: Grafana, Prometheus
Disclaimer:
Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.
Range and benefit information provided in this posting are specific to the stated locations only
US: Hiring Range in USD from: $81,100 to $187,000 per annum. May be eligible for bonus and equity.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - IC3
About Us
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on click.appcast.io or by calling View phone number on click.appcast.io in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
- Koitecc Solutions is seeking a Site Reliability Engineer (SRE) in Scottsdale, AZ to ensure the reliability and performance of the myPBM platform. This hybrid role involves implementing automation and observability practices while collaborating with cross-functional teams...SeniorFull time
- Teradata Corporation (SE) in Phoenix, Arizona is looking for a Site Reliability Engineer to design and improve software solutions and lead chaos engineering efforts. You'll collaborate with globally-distributed teams and leverage AI technologies for operational efficiency...SeniorFlexible hours
$84.9k - $209.5k
...architects infrastructure and service to ensure reliability and functionality. Forecasts demands and... ...and maintains advanced knowledge of site reliability trends. #LI-E2... ...valuable insights and information with senior team members, management, and beyond to...SuggestedTemporary workImmediate startFlexible hoursShift work- ...Title: Site Reliability Engineer Location: Phoenix, AZ Job Type: Full Time Minimum Qualifications •BS or MS degree in computer science, computer engineering, or other technical discipline, or equivalent 3-6 years of work experience in DevOps...SuggestedFull timeWork experience placement
- ...improve software solutions to ensure system reliability and availability, mitigate operational... ...issues. You will help lead chaos engineering efforts in a production‑alike environment... ...professionals, with engineers focused on site reliability engineering and observability...SeniorPermanent employmentFlexible hours
$100k - $125k
Tata Consultancy Services is looking for an experienced SRE with Data Engineer skills in Phoenix, Arizona. The role requires in-depth knowledge of the Hadoop ecosystem, cloud platforms, and robust analytical skills. Candidates are expected to design scalable big-data solutions...Senior$142.7k - $158.3k
...Basic Qualifications Bachelor's degree in Software Engineering, or related Science, Technology, Engineering or Mathematics field... ...Responsibilities for this Position What You'll Own SLOs and reliability metrics. Define service level objectives for every AI service...Remote workFlexible hours- ...Job Title : Site Reliability Engineer Hybrid Onsite : Worker is required to work onsite 2-3 days per week in Phoenix, AZ OR Plano, TX MAIN RESPONSIBILITIES • Experience in leading Observability initiatives as Lead Engineer. • Development and implementation...Work experience placementLocal area2 days per week3 days per week
$95k - $171k
.... Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads within Akamai's serverless inference platform. As an Site Reliability Engineer II, you will be responsible for: Building and maintaining dashboards, alerts...Permanent employmentWork experience placementWork at officeRemote workWork from homeWorldwideFlexible hours- Job Title Good understanding of Production Support, Tools & Automation with 5+ years of Experience Requires knowledge using AppDynamics and APM Solutions to monitor application performance & infrastructure and aide in troubleshooting Experience on GCP, Microservices...
- ...Title: Site reliability engineer *Local to AZ Description: Identifies and establishes ways of stabilizing environments and sites while assessing opportunities to drive engineering stability through the analytics and metrics. Responsible for site design consulting...Local area
- ...Director, Site Reliability Engineering Phoenix, Arizona SmartRent (NYSE: SMRT) is revolutionizing how people live and work with the industry's only end-to-end platform designed for the rental housing industry. By uniting purpose-built software, integrated hardware...Flexible hours
$55k - $152.38k
Fairygodboss is seeking a Technology Engineer based in the United States, particularly in Phoenix, AZ, to join our Site Reliability Center. This role involves ensuring system reliability, leading incident management efforts, and optimizing performance across applications...$112k - $137k
Site Reliability Engineer - Web Applications, AVPSkip to main contentWe may use cookies and other tracking technologies to assist with navigation, improve our products and services, assist with our marketing efforts, and provide content from third parties. For more information...Work at officeLocal areaRemote work$60 - $65 per hour
...Senior Advanced Software Engineer Immediate need for a talented Senior Advanced Software Engineer. This is a 06 months contract opportunity and is located in Phoenix, AZ (Hybrid). Please review the job description below and contact me ASAP if you are interested....SeniorContract workLocal areaImmediate start- Senior.Net Developer Location: Phoenix, AZ Duration: 12 Months Need strong background with Angular (preferred v14 or higher). Need solid understanding with.Net core and API. Must be a team player. Excellent communication skills required.Senior
- ...First Citizens is looking for a Senior Software Engineer to join their Enterprise Payments team. This remote role focuses on building, deploying, and maintaining client-facing Restful APIs. You will be responsible for all software aspects within the team, including research...SeniorRemote work
$85k - $148k
...Senior Mainframe Systems Programmer – zVMRemote - United StatesJR012366 A Senior IBM z/VM Systems Programmer ensures the stability,... ...remotely most of the time so if you are not required to be on a client site, you can choose to work from home or in our Ensono offices....SeniorFull timeTemporary workRemote workWork from homeFlexible hours- ...years experience C/C++ DO-178B/C Job Summary As a Senior Technical Lead specializing in Embedded C, you will play a critical... ...1. Optional But Valuable Certifications: Certified Embedded Systems Engineer (Cese), Or Relevant Certifications In Rtos And C++.Senior
- ...Insight Global is seeking a Senior .NET Developer to work remotely from Colombia. The ideal candidate should have at least 6+ years of experience with C# or Java, and strong knowledge of .NET frameworks including MVC and Web APIs. You will be responsible for developing...SeniorRemote work
- ...the latest trends and technologies in software development. Required: Bachelor's or higher degree in Computer Science, Engineering, or related field. Advanced skills in Node.js, TypeScript, Apollo GraphQL, MongoDB, and front-end development with Angular,...Senior
- 1000 Kyndryl, Inc. is seeking a highly skilled Mainframe IMS System Programmer in Phoenix, Arizona. This role involves supporting and managing IBM z/OS IMS environments, ensuring stability and performance in critical banking and financial services sectors. The ideal candidate...Senior
- ...Ernst & Young Oman is hiring an FSO DevOps Engineer Senior Analyst for the Service Delivery Center in Phoenix, AZ. This role involves driving delivery and operations of the Web3 Platform, implementing CI/CD processes, and supporting globally distributed teams. The ideal...Senior
- ...A leading supply chain management company is seeking a Senior Industry Principal to serve as a trusted C-suite advisor. This remote position requires expertise in supply chain orchestration and industry-specific knowledge. The ideal candidate should have over a decade...SeniorRemote work
$132.9k - $199.3k
Teradata Corporation (SE) is looking for a Staff Virtualization Engineer in Phoenix, Arizona. This role focuses on developing and maintaining their on-premises infrastructure leveraging virtualization technologies. The ideal candidate will possess expert knowledge in KVM...Senior- ...Job Title:Senior React Developer Work Location: Phoenix, AZ 85027 Contract duration: 6 months Job Details: Must Have Skills---Redux, Reactive Programming, HTML, CSS Detailed Job Description Person with strong expertise in UI Technologies such as...SeniorContract work
$105k - $145k
...Senior Reliability Engineer Requisition ID: 144079 Location: Phoenix, AZ, US, 85040 Category: Engineering Services Share this Job... ...meetings at our Collaboration Hub in Phoenix or at one of our site locations. The Collaboration Hub provides an open, flexible...SeniorFlexible hours- Vanguard is seeking a Mac Engineer to enhance our Mac platform's engineering and operation. You will support aspects like device management and scripting, bringing strong macOS experience to a collaborative team. Ideal candidates will have 3-5+ years in macOS engineering...Senior
- ...created before 2024 will be accepted. Job Description: Role: Devops Engineer Mode: Hybrid. About the Role: We are seeking a highly experienced Senior AWS DevOps Engineer with deep expertise in Linux systems, Amazon EMR, and modern...Senior
- A leading consulting firm seeks an SAP GRC/Security Consultant to manage client engagements and support SAP security implementations. The ideal candidate will possess hands-on experience with SAP GRC solutions, strong analytical skills, and relevant certifications. This...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Site Reliability Engineer. Be the first to apply!
- site reliability engineer Phoenix, AZ
- senior cloud service delivery manager Phoenix, AZ
- senior business analyst contract Phoenix, AZ
- senior product design engineer Phoenix, AZ
- senior game producer Phoenix, AZ
- senior software manager Phoenix, AZ
- senior manager business analytics Phoenix, AZ
- senior marketing account manager Phoenix, AZ
- senior marketing manager Phoenix, AZ
- senior contracts analyst Phoenix, AZ

