Director, Site Reliability Engineering
$121.5k - $306.4kOracle
Job Description
Provides leadership to one or more teams designing and architecting infrastructure and service and provides input on best practices for reliability and functionality. Establishes direction to ensure accurate forecasting and ensure systems have adequate resources. Builds collaborative relationships with the software development team to create reliable, scalable infrastructures. Ensures alignment regarding data collection and contributes to standards for optimizing operations and infrastructure reliability. Defines approaches for incident response activities to ensure service reliability. Ensures in-depth reports. Plays a key role in developing standards for identifying and recommending automation. Anticipates and explains the impact of changes, mentoring other managers on what to communicate. Defines approaches for escalating incidents and refines methods for documentation. Encourages experimenting with new technology, executing improvements, building site reliability knowledge, and providing clear data.
#LI-ES2
Responsibilities
Key Responsibilities
Capacity Ingestion and Management:
-Provides leadership for one or more teams designing and architecting infrastructure and/or service, providing input on the development of best practices for adhering to terms for reliability and functionality.
-Establishes direction for other managers and senior-level individuals to drive the forecasting of demands for infrastructure and respond to capacity needs, ensuring that systems have sufficient resources to meet current and future workloads and identifying and addressing resource gaps.
-Builds collaborative relationships with senior software development team members to design and develop infrastructures that are highly reliable and scalable, meeting stringent deployment requirements.
-Ensures teams align on expectations for identifying opportunities for prototyping and oversees prototyping initiatives (e.g., testing new applications or infrastructures, assisting in onboarding), experimenting with cutting-edge approaches.
Incident and Service Lifecycle Management:
-Ensures alignment across teams regarding performing data collection, triage, technical analysis, and redirection, contributing to the development of standards to maintain and optimize operations and infrastructure reliability.
-Shares techniques across teams for monitoring of services, maintaining up-to-date knowledge of their performance, and thoroughly documenting their condition.
-Defines approaches for performing incident response, root cause analysis, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery) and drives execution.
-Ensures teams provide in-depth health and performance reporting and coordinates managerial actions based on trends in data.
-Refines procedures for performing provisioning to support infrastructure, applications, and services, mentoring team members.
-Provides input on standards for decommissioning (e.g., shutting down servers, removing data from databases) to remove objects that are no longer needed.
Automation:
-Plays a key role in developing standards for identifying and recommending opportunities for automation and reviewing potential benefits in terms of metrics across teams to ensure expectations are met.
-Ensures alignment on expectations for developing and drives the implementation of design, automation tools, or scripts.
-Refines strategies for conducting testing on highly complex automations to ensure they perform tasks correctly and produce expected results.
-Provides guidance and expertise to others testing automations.
Technical Communication and Guidance:
-Shares expectations for release notes and communication of in-depth information about the scale, capacity, security, performance attributes, and requirements of services and technology with customers, cross-functional teams and leadership.
-Anticipates and explains the potential impact of infrastructure, feature, and tool changes, considering the strategic implications and goals.
-Takes a leadership role in mentoring other managers on what information to communicate and how to communicate.
Troubleshooting and Resolution:
-Defines approaches for escalating incidents and other highly complex issues arising within Oracle services within and across teams.
-Coordinates with other team leaders to review service performance and ensure the resolution of technical issues spanning multiple services and customers, encouraging collaboration across teams and leveraging advanced investigation and debugging techniques to ensure the achievement of SLOs (service level objectives).
-Refines standard reporting methods for incident documentation and performing root cause analyses, aiming to capture insights and lessons learned for continuous improvement and knowledge sharing.
-Plays a key role in creating guidelines for post-mortem procedures to prevent incident reoccurrence.
-Communicates with other team leaders to ensure adherence to service level agreements (SLAs) made with customers.
Innovation and Improvement:
-Encourages creativity and innovation and coordinates with other leaders to drive the exploration and adoption of innovative tools and technologies to transform infrastructure performance and reliability, investigating implications of adherence to security standards on other integrations.
-Provides input on initiatives to improve performance bottlenecks and optimize deployments, aligning other leaders on expectations for efficient resource usage, speed, and scalability and driving roadmap development.
-Refines standards for developing and maintaining knowledge of site reliability trends and sharing valuable insights and information cross-functionally to drive innovation in building, testing, deploying, and running services.
-Plays a key role in the review of analyses and data, driving and influencing business development decisions (e.g., design changes).
Core Responsibilities
Planning & Execution:
-Oversees and guides multiple teams on managing complex projects or initiatives, monitoring timelines, deliverables, and budgets when applicable to ensure strategic objectives are met. Serves as a role model for appropriately delegating work, setting priorities, and ensuring alignment with business needs. Coaches others on adjusting resources or project timelines in anticipation of business changes.
Collaboration & Partnership:
-Role models leading cross-functional collaborative efforts to ensure alignment of expectations and strategic objectives. Empowers team to build and maintain partnerships with business leaders, stakeholders, and/or customers to address barriers and contribute to organizational success. Drives transparency and inclusivity by modeling actively seeking, listening to, and leveraging diverse perspectives.
Problem Solving:
-Shares problem-solving strategies across teams, providing oversight on complex operational and/or technical issues, as needed. Coaches teams on analyzing highly complex data and/or information to identify solutions to ambiguous issues and provides direction on identifying root causes to prevent recurrence of issues.
Continuous Learning:
-Pursues strategic learning opportunities to maintain expertise and apply best practices at the organizational level. Creates opportunities for team members and leaders to build their expertise in new areas, coaching them to build innovative skills. Identifies skill gap trends across the organization, and upholds a culture that places significant emphasis on sharing knowledge and pursuing learning opportunities that advance the organization. Evaluates efficiency of learning strategies and recommends adjustments as needed.
Continuous Improvement:
-Empowers team to own the development and implementation of ideas that increase the efficiency and effectiveness of processes, protocols, and workflows across the department. Coaches teams to gain buy-in for ideas and to seek feedback on approaches and methods for continued improvement. Prioritizes and reviews the roadmap of improvement initiatives to ensure alignment with strategic direction and maximize return on investments.
Performance and Development:
-Serves as a role model for driving performance across teams through tailored feedback and coaching in alignment with performance management processes, guidelines, and expectations. Drives consistency in the application of talent development procedures and socializes performance expectations across the organization. Ensures that individual development goals are aligned with organizational strategic initiatives. Collaborates with HR to implement talent strategy through hiring and promotion processes.
Disclaimer:
Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.
Range and benefit information provided in this posting are specific to the stated locations only
US: Hiring Range in USD from: $121,500 to $306,400 per annum. May be eligible for bonus, equity, and compensation deferral.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - M4
About Us
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on click.appcast.io or by calling View phone number on click.appcast.io in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
$126k - $248k
...As a TPM for SRE, you will partner with SRE leaders and engineers to scale the platform that underpins all of MongoDB's cloud products. You will drive program execution, strengthen production reliability practices, and coordinate cross-functional efforts across US and...SuggestedLocal areaRemote workWorldwideFlexible hours$163k - $237k
Technical Program Manager III, Site Reliability Engineering, Cloud Infrastructure Location: Raleigh, NC, USA; Durham, NC, USA. Level: Mid. Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Raleigh...SuggestedFull time$104.9k - $174.7k
...scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives, whilst enabling the fast flow of feature and... ...strong automation skills. About team; This diverse team of Engineers in assisting multiple product teams as we continue to innovate...SuggestedLocal areaImmediate startWorldwide- ...IXL Learning, developer of personalized learning products used by millions of people globally, is seeking a Senior Site Reliability Engineer to join our team, and help maintain the reliability and optimal performance of our products. We are seeking engineers with a passion...SuggestedWork at officeImmediate start
- ...Job Overview: Site Reliability Engineer role focuses on reliability, scalability, and performance of enterprise platforms across cloud and on-prem environments. Position requires hands-on engineering with automation, observability, and cross-functional collaboration....Suggested
$83k - $187k
...practices, and ability to develop tools that automate incident management. Description We are looking for a Senior Site Reliability Engineer to join our OCI team. This role is part of a globally distributed team responsible for detecting, triaging, and mitigating...Temporary workWork experience placementFlexible hours$109.5k - $150.55k
...strive for the best, own our actions, and grow and evolve. Job Description Renaissance is looking for an experienced Sr Site Reliability Engineer to be part of the Engineering Enablement group's Site Reliability Team with a focus on Application and Infrastructure...For contractorsLocal areaRemote workWorldwideWork visaFlexible hoursWeekend work- ...significantly reduces costs and improves the critically important 24x7 performance for building owners, developers and tenants. Site Reliability Engineer II The SRE II sits at the intersection of software engineering and platform operations. You will own the reliability,...Remote work
$84.9k - $209.5k
...Designs and architects infrastructure and service to ensure reliability and functionality. Forecasts demands and responds to capacity needs... ...new tools and develops and maintains advanced knowledge of site reliability trends. #LI-E2 Responsibilities Key Responsibilities...Temporary workImmediate startFlexible hoursShift work$118.6k - $195.68k
Job Description RedHat is seeking a Senior Software Engineer to join the HCP Platform Engineering team, building and operating ROSA (... ...work at the intersection of software engineering and production reliability on one of RedHat’s most complex and high‑scale platforms. The...Permanent employmentFull timeContract workWork experience placementRemote workFlexible hours$174k - $252k
Senior Software Engineer, Site Reliability Engineering X Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Raleigh, NC, USA; Durham, NC, USA . Bachelor’s degree in Computer Science, Engineering...Full time- A leading tech company in Raleigh is seeking a Senior Software Engineer for Site Reliability Engineering. The role focuses on improving the lifecycle of services and maintaining them post-launch, ensuring high availability and performance. Candidates should have a Bachelor...
- ProducePay is hiring an Associate Software Engineer for their Site Reliability Engineering team in Raleigh. This role focuses on maintaining the cloud platform's infrastructure and ensuring optimal performance for frontline workers. Candidates should be curious, collaborative...
$127k
Position Overview Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational... ...maintains our continuous delivery infrastructure, ensuring reliable code deployment from development through production for all...Local areaFlexible hours$120k - $137.5k
eClerx is seeking a motivated SRE/DevOps Engineer with strong observability experience to... ...advancing DevOps practices, improving platform reliability, and supporting highly available... ...automation, observability frameworks, and site reliability engineering principles. This...Full timeWork experience placement- ...improve software solutions to ensure system reliability and availability, mitigate operational... ...issues. You will help lead chaos engineering efforts in a production‑alike environment... ...professionals, with engineers focused on site reliability engineering and observability...Permanent employmentFlexible hours
- NetActuate is seeking a Site Reliability Engineer to handle complex challenges of scale in a fully remote role. The position involves managing large-scale systems and ensuring their reliability, uptime, and performance. Responsibilities include engaging in service lifecycle...Remote job
- The Consulting Solutions is looking for a Senior Site Reliability Engineer to enhance MongoDB’s cloud storage layer. You will work on distributed storage services to support customer workloads efficiently. This role can be remote or based in Raleigh, NC. Ideal candidates...Remote workFlexible hours
$126k - $248k
...services to define SLOs, shape capacity plans, and ensure the reliability, durability, and operational safety of the storage layer that... ...long‑term strategic infrastructure goals with immediate engineering needs. Build for reliability, making services and infrastructure...Local areaImmediate startRemote workFlexible hoursShift work- ...Description Site Reliability Engineer The Company: Varonis (Nasdaq: VRNS) secures AI and the data that powers it. The Varonis platform gives organizations automated visibility and control over their critical data wherever it lives and ensures safe and trustworthy...Work at officeWorldwide
- ...Job Title: Site Reliability Engineer Location: Cary, NC (Onsite) Job Type: Fulltime Job Description Must Have Technical/Functional Skills • Site Reliability Engineering with Elastic AI Ops implementation experience • Should be very thorough in architecture...Full time
- ...offered through WebMD ~ Employee recognition programs ~ Culture of employees creating an IMPACT! Position Summary: As the Director of Rehabilitation, it is your responsibility to organize, develop, manage, and direct the overall operations of the Rehabilitation...Weekly payDaily paidFull timeLocal areaImmediate startFlexible hoursShift work
- ...Site Reliability Engineer I, Abhishek, would like to share a job opportunity as Site Reliability Engineer in Jacksonville, FL, Cary, NC or New York, NY (Onsite) location for a Fulltime position. In case, if you are not comfortable with this location, please share your...Full timeWork visa
$180k - $303.6k
...About the Role PagerDuty is seeking a Director of Pricing & Monetization to own the... ...frameworks - in partnership with Product and Engineering Build and maintain a monetization... ...-specific offerings, on our benefits site ( . Your package may include:...Local areaFlexible hours$109.2k - $223.4k
...Job Description The Director for Global Defense - Japan is responsible for leading and growing strategic defense and national security... ...coordinates closely with Japan country leadership, product/engineering, legal, security, and delivery teams. Responsibilities...Contract workTemporary workFor contractorsLocal areaFlexible hours$148.84k - $198.45k
...the challenge. Join us in building the future. The Role Director II, SLED Capture & Proposal Management - Public Sector Location... ..., and the ability to influence across sales, solutions engineering, product, and operational teams. The director will oversee SLED...Full timeContract workTemporary workLocal areaRemote work$141.2k - $338.5k
...intelligence around the world. As Senior Director, Multimodal GenAI and Infrastructure... ...ensuring the infrastructure, capacity, reliability, governance, and operating model needed... ...will partner across applied science, engineering, product, security, finance, operations...Temporary workFlexible hours$186.49k - $278.88k
...Position Summary We are seeking an innovative and strategic leader to serve as the Director, U.S. Neuroscience Pipeline & Portfolio Strategy. This individual is responsible for supporting Otsuka's U.S. commercialization efforts, ensuring that pipeline assets are positioned...Temporary workLocal areaFlexible hoursShift work- ...Calyx Living is actively seeking an energetic and creative Cruise Director to lead the Activity Program for our contemporary senior living community, Calyx Living of Wakefield. We are located in beautiful Wakefield, just off Capital Boulevard in North Raleigh. Calyx is...
$186.9k - $234k
...serve as the strategic architect of Rubrik's most critical industry partnerships. As a Global Alliances Director, you will orchestrate a massive cross-functional engine-spanning Field Sales, Engineering, and Marketing-to deepen integration with key alliance partners and...Local areaRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Director, Site Reliability Engineering. Be the first to apply!
- principal network engineer Raleigh, NC
- senior director engineering Raleigh, NC
- engineering director Raleigh, NC
- principal engineer Raleigh, NC
- director software engineering Raleigh, NC
- project engineer assistant project manager Raleigh, NC
- general engineer Raleigh, NC
- director data engineering Raleigh, NC
- principal data engineer Raleigh, NC
- senior chief engineer Raleigh, NC

