Director, Site Reliability Engineering
$121.5k - $306.4kOracle
Job Description
Provides leadership to one or more teams designing and architecting infrastructure and service and provides input on best practices for reliability and functionality. Establishes direction to ensure accurate forecasting and ensure systems have adequate resources. Builds collaborative relationships with the software development team to create reliable, scalable infrastructures. Ensures alignment regarding data collection and contributes to standards for optimizing operations and infrastructure reliability. Defines approaches for incident response activities to ensure service reliability. Ensures in-depth reports. Plays a key role in developing standards for identifying and recommending automation. Anticipates and explains the impact of changes, mentoring other managers on what to communicate. Defines approaches for escalating incidents and refines methods for documentation. Encourages experimenting with new technology, executing improvements, building site reliability knowledge, and providing clear data.
#LI-ES2
Responsibilities
Key Responsibilities
Capacity Ingestion and Management:
-Provides leadership for one or more teams designing and architecting infrastructure and/or service, providing input on the development of best practices for adhering to terms for reliability and functionality.
-Establishes direction for other managers and senior-level individuals to drive the forecasting of demands for infrastructure and respond to capacity needs, ensuring that systems have sufficient resources to meet current and future workloads and identifying and addressing resource gaps.
-Builds collaborative relationships with senior software development team members to design and develop infrastructures that are highly reliable and scalable, meeting stringent deployment requirements.
-Ensures teams align on expectations for identifying opportunities for prototyping and oversees prototyping initiatives (e.g., testing new applications or infrastructures, assisting in onboarding), experimenting with cutting-edge approaches.
Incident and Service Lifecycle Management:
-Ensures alignment across teams regarding performing data collection, triage, technical analysis, and redirection, contributing to the development of standards to maintain and optimize operations and infrastructure reliability.
-Shares techniques across teams for monitoring of services, maintaining up-to-date knowledge of their performance, and thoroughly documenting their condition.
-Defines approaches for performing incident response, root cause analysis, and/or maintenance on assigned services (e.g., software installs, version upgrades, security updates, backup and recovery) and drives execution.
-Ensures teams provide in-depth health and performance reporting and coordinates managerial actions based on trends in data.
-Refines procedures for performing provisioning to support infrastructure, applications, and services, mentoring team members.
-Provides input on standards for decommissioning (e.g., shutting down servers, removing data from databases) to remove objects that are no longer needed.
Automation:
-Plays a key role in developing standards for identifying and recommending opportunities for automation and reviewing potential benefits in terms of metrics across teams to ensure expectations are met.
-Ensures alignment on expectations for developing and drives the implementation of design, automation tools, or scripts.
-Refines strategies for conducting testing on highly complex automations to ensure they perform tasks correctly and produce expected results.
-Provides guidance and expertise to others testing automations.
Technical Communication and Guidance:
-Shares expectations for release notes and communication of in-depth information about the scale, capacity, security, performance attributes, and requirements of services and technology with customers, cross-functional teams and leadership.
-Anticipates and explains the potential impact of infrastructure, feature, and tool changes, considering the strategic implications and goals.
-Takes a leadership role in mentoring other managers on what information to communicate and how to communicate.
Troubleshooting and Resolution:
-Defines approaches for escalating incidents and other highly complex issues arising within Oracle services within and across teams.
-Coordinates with other team leaders to review service performance and ensure the resolution of technical issues spanning multiple services and customers, encouraging collaboration across teams and leveraging advanced investigation and debugging techniques to ensure the achievement of SLOs (service level objectives).
-Refines standard reporting methods for incident documentation and performing root cause analyses, aiming to capture insights and lessons learned for continuous improvement and knowledge sharing.
-Plays a key role in creating guidelines for post-mortem procedures to prevent incident reoccurrence.
-Communicates with other team leaders to ensure adherence to service level agreements (SLAs) made with customers.
Innovation and Improvement:
-Encourages creativity and innovation and coordinates with other leaders to drive the exploration and adoption of innovative tools and technologies to transform infrastructure performance and reliability, investigating implications of adherence to security standards on other integrations.
-Provides input on initiatives to improve performance bottlenecks and optimize deployments, aligning other leaders on expectations for efficient resource usage, speed, and scalability and driving roadmap development.
-Refines standards for developing and maintaining knowledge of site reliability trends and sharing valuable insights and information cross-functionally to drive innovation in building, testing, deploying, and running services.
-Plays a key role in the review of analyses and data, driving and influencing business development decisions (e.g., design changes).
Core Responsibilities
Planning & Execution:
-Oversees and guides multiple teams on managing complex projects or initiatives, monitoring timelines, deliverables, and budgets when applicable to ensure strategic objectives are met. Serves as a role model for appropriately delegating work, setting priorities, and ensuring alignment with business needs. Coaches others on adjusting resources or project timelines in anticipation of business changes.
Collaboration & Partnership:
-Role models leading cross-functional collaborative efforts to ensure alignment of expectations and strategic objectives. Empowers team to build and maintain partnerships with business leaders, stakeholders, and/or customers to address barriers and contribute to organizational success. Drives transparency and inclusivity by modeling actively seeking, listening to, and leveraging diverse perspectives.
Problem Solving:
-Shares problem-solving strategies across teams, providing oversight on complex operational and/or technical issues, as needed. Coaches teams on analyzing highly complex data and/or information to identify solutions to ambiguous issues and provides direction on identifying root causes to prevent recurrence of issues.
Continuous Learning:
-Pursues strategic learning opportunities to maintain expertise and apply best practices at the organizational level. Creates opportunities for team members and leaders to build their expertise in new areas, coaching them to build innovative skills. Identifies skill gap trends across the organization, and upholds a culture that places significant emphasis on sharing knowledge and pursuing learning opportunities that advance the organization. Evaluates efficiency of learning strategies and recommends adjustments as needed.
Continuous Improvement:
-Empowers team to own the development and implementation of ideas that increase the efficiency and effectiveness of processes, protocols, and workflows across the department. Coaches teams to gain buy-in for ideas and to seek feedback on approaches and methods for continued improvement. Prioritizes and reviews the roadmap of improvement initiatives to ensure alignment with strategic direction and maximize return on investments.
Performance and Development:
-Serves as a role model for driving performance across teams through tailored feedback and coaching in alignment with performance management processes, guidelines, and expectations. Drives consistency in the application of talent development procedures and socializes performance expectations across the organization. Ensures that individual development goals are aligned with organizational strategic initiatives. Collaborates with HR to implement talent strategy through hiring and promotion processes.
Disclaimer:
Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.
Range and benefit information provided in this posting are specific to the stated locations only
US: Hiring Range in USD from: $121,500 to $306,400 per annum. May be eligible for bonus, equity, and compensation deferral.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - M4
About Us
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on click.appcast.io or by calling View phone number on click.appcast.io in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
$126k - $248k
...As a TPM for SRE, you will partner with SRE leaders and engineers to scale the platform that underpins all of MongoDB's cloud products. You will drive program execution, strengthen production reliability practices, and coordinate cross-functional efforts across US and...SuggestedLocal areaRemote workWorldwideFlexible hours- ...Senior Site Reliability Engineer United States About OfficeSpace: OfficeSpace Software provides the leading AI operating system for the built world, that helps teams plan, connect, and perform in the workplace. As a performance-based, PE-backed company, we hire...SuggestedShift work
- ...Sr. Site Reliability Engineer (SRE) III As a Sr. Site Reliability Engineer (SRE) III, you'll work as part of a collaborative and high-performing team providing your expertise to deliver technical solutions within the highest levels of the federal government. We believe...SuggestedImmediate start
$135k - $150k
...Mission Focused Expertise: From veteran leadership to cleared engineers, our people understand both the technology and the mission. Summary Bridge Defense seeks a highly qualified Site Reliability Engineer to build and lead the company's deployment engineering...SuggestedRelocationFlexible hours- ...Site Reliability Engineer Qualifications: 10+ years of overall experience in IT including, with hands-on Development and Systems engineering background 3-5 years of experience in a Site Reliability Engineering role Experience with Enterprise Cloud transformation...SuggestedTemporary workImmediate start
$175k - $195k
...Filevine Sr. Observability Engineer Filevine is a Legal AI company delivering Legal Operating... .... # Define and manage SLIs, SLOs, and reliability metrics. # Lead incident response,... ..., or operations. #5+ years of Site Reliability Engineering experience. #...Full timeTemporary work$160k - $180k
...Site Reliability Engineer Location: Hybrid – Washington DC/Virginia/Maryland metro with the ability to travel to Patuxent River, MD, as needed (up to 20% of the time). Compensation: $160,000 - 180,000 per year, depending on experience and qualifications. Employment...Full timeTemporary workLocal areaRemote workFlexible hours$104.9k - $174.7k
...scale, 24x7, distributed and fault-tolerant systems within agreed reliability objectives, whilst enabling the fast flow of feature and... ...strong automation skills. About team; This diverse team of Engineers in assisting multiple product teams as we continue to innovate...Local areaImmediate startWorldwide$131k - $227.13k
...Description: The 1LMX MES COE is seeking an engineer who will own infrastructure‑as‑code, cloud platform, and reliability for the Apriso environment on AWS. This role blends full‑stack development, DevOps, and Site Reliability Engineering (SRE) practices to deliver a...Full timeTemporary workWork experience placementWork at officeRemote workRelocationFlexible hoursShift work3 days per week$112k - $179k
...system, network, software, and security solutions. About The Role Peraton is seeking a self-driven and resourceful Site Reliability Engineer to join our dynamic of Network and UC engineers in Washington, DC. This position combines software engineering and systems...Contract workWorldwideShift work$131k - $164k
...Staff Site Reliability Engineer New York, New York, United States Position Overview We are seeking a highly skilled Staff Site Reliability Engineer with deep technical expertise across VMware, Linux, and automation frameworks, to join our global Infrastructure...Work at officeLocal areaFlexible hours$84.9k - $209.5k
...Designs and architects infrastructure and service to ensure reliability and functionality. Forecasts demands and responds to capacity needs... ...new tools and develops and maintains advanced knowledge of site reliability trends. #LI-E2 Responsibilities Key Responsibilities...Temporary workImmediate startFlexible hoursShift work$191k - $287k
...requirements and customer expectations. Our systems integration engineers internalize the nuances of each deployment, ensuring the... ...‑end solutions we ship. About the Job We are looking for a Site Reliability Engineer (SRE) to join AGD, our rapidly growing team in Costa...- Role Overview We are seeking a high-caliber Site Reliability Engineer (SRE) to join our Forward Engineering team. You will be the guardian of our production ecosystems, ensuring that our complex, data-driven AI platforms remain resilient, scalable, and highly performant...Local area
- ...ears, and hands on the ground at a government customer site, ensuring the reliability and performance of Twenty's mission-critical platform running... ...of deep technical ownership and customer-facing engineering: you'll define how we measure reliability, lead incident...Full timeWork at officeRemote workFlexible hours
$166k - $220k
ABOUT THE JOB As a site reliability engineer in Platform Discovery, you will solve a wide variety of problems involving networking, autonomy, systems integration, robotics, and more, while making pragmatic engineering tradeoffs along the way. Your efforts will ensure that...Full timeWork experience placementRelocation package- ..., and Onsite Notice: This role requires regularly working on-site at customer locations in Arlington, VA. If you are not currently... ...obtain SCI eligibility. About The Role We are hiring a Site Reliability Engineer to join our Infrastructure & Security team. You’ll work...RelocationRelocation package
- ...Python, and PowerShell, integrating systems, and managing Microsoft Entra services. A minimum of 5 years of experience in systems engineering is required along with a Bachelor's degree in Computer Science. The position offers a hybrid work model as employees must be...Local area
$106.5k - $177.5k
The Site Reliability Engineering discipline at Noctua Technology, LLC is a strategic force driving digital transformation. We treat operations as a software engineering challenge, focusing on the seamless integration, scalability, and long-term reliability of cloud native...Full timeRemote work$86.8k - $198k
Site Reliability Engineer The Opportunity: At Booz Allen, our Global Defense Sector (GDS) supports the Department of War (DoW) in delivering resilient, mission-critical digital capabilities. We are seeking a Site Reliability Engineer to help ensure that essential platforms...Full timeContract workPart timeWork at officeLocal areaRemote work- ...Sr. IT Application Solutions Architect /SRE Engineer We have shifted to adopting SAFe and 1. Encourage Contractors trained in SAFe 2. Request that contractors have camera available and on majority of the time 3. Set expectations that they are part of Agile team on...For contractorsShift work
- Chief Operating Officer (COO), Integrator About the Company Dynamic real estate brokerage Industry Real Estate Type Privately Held Founded 1973 Employees 1-10 About the Role The Company is seeking a Chief Operating Officer (COO) / Integrator...For contractors
- ...Site Reliability Engineer II Join the leader in providing smarter solutions for a safer world. The property technology space is growing rapidly, and Kastle Systems is leading the way. Kastle Systems is the leader in managed security, with a track record of introducing...Remote work
$183.1k - $305.2k
...The Director of Policy & Government Affairs Science and FDA Policy will lead Regenerons science and FDA policy portfolio, developing, shaping and influencing government and multinational organization policy related to the scientific research and development of biopharmaceutical...Temporary workLocal area- ...Director of Policy & Government Affairs – Science and FDA Policy As the Director of Policy & Government Affairs for Science and FDA Policy, you will lead Regeneron’s science and FDA policy portfolio, developing, shaping and influencing government and multinational organization...Temporary workLocal area
- ...Type: Rehabilitation Bethesda , MD SkyBridge Healthcare is currently seeking Director of Rehab with Rehabilitation experience for a 13-week contract in MD. SkyBridge Healthcare is a premier staffing firm dedicated to matching outstanding talent with exceptional...Weekly payFull timeContract workRelocation package
$88k - $115k
...Director of Rehabilitation Autumn Lake Healthcare at Chevy Chase - Chevy Chase, MD $88,000-$115,000/year Ultimate Therapy is seeking a full-time Director of Rehabilitation to oversee therapy services within a skilled nursing facility. This leadership role is...Full time- ...are looking for an OTR or SLP to fill this role. As a Rehab Director with Legacy Healthcare Services, you serve as the primary Legacy... ...leaders apprised of Legacy's programs and other offerings. Manage site productivity and makes necessary changes to achieve budget....Full timeTemporary workImmediate startMonday to Friday
- ...Political Director Susan B. Anthony Pro-Life America (SBA) is the largest pro-life political organization in the nation, mobilizing a nationwide network of more than one million Americans. We integrate politics and policy by investing heavily in voter education, issue...Full timeTemporary workCasual workWork at office
$186.49k - $278.88k
...Otsuka is seeking an experienced Director of Statistics to join our Data Science and AI group to provide statistical leadership and solutions to efficient phase 3b/4/Real-World-Evidence (RWE) study design, global Health Technology Assessment (HTA) and regulatory requirements...Temporary workLocal areaFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Director, Site Reliability Engineering. Be the first to apply!
- principal developer Washington DC
- engineering director Washington DC
- principal data engineer Washington DC
- senior chief engineer Washington DC
- chief engineer Washington DC
- data center chief engineer Washington DC
- civil engineer project manager Washington DC
- senior civil engineer project manager Washington DC
- director systems engineering Washington DC
- director data engineering Washington DC

