Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal AI Site Reliability Engineer (US REMOTE)

$86.4k - $199.5k

Oracle

Job Description

Join Oracle's Health Data Intelligence (HDI) team as a Software Engineer 4, focused on Site Reliability Engineering for large-scale healthcare analytics platforms. In this role, you will design, build, and operate highly reliable, scalable infrastructure and data pipelines that power mission-critical analytics globally.

You will also contribute to the next evolution of cloud operations by advancing automation, observability, and AI-assisted reliability practices. This includes exploring the use of Generative AI and intelligent automation to improve incident response, system resilience, and operational efficiency.

You will work within a collaborative team to deliver robust solutions that handle massive datasets with precision and performance, while continuously improving system reliability and operational excellence.

U.S. citizenship is required for this position, as the successful candidate will be required to obtain (and maintain) a U.S. government security clearance after hire.

Required Skills

Infrastructure & Reliability

  • Experience building and operating high-availability, fault-tolerant systems

  • Strong understanding of distributed systems, performance monitoring, and resiliency patterns

  • Experience with incident response, root-cause analysis, and production troubleshooting

AI-Native Engineering (NEW)

  • Hands-on experience applying Generative AI or Agentic AI (e.g., LangChain, AutoGPT, custom agents) to:

  • Infrastructure lifecycle management

  • Observability and anomaly detection

  • Incident response and remediation automation

  • Ability to design or integrate AI-driven workflows for operational efficiency and reliability

  • Familiarity with building or integrating autonomous agents for DevOps/SRE use cases

Cloud & Multi-Cloud Ecosystems

  • Strong experience with multi-cloud environments (OCI, AWS/Azure)

  • Deep understanding of cloud infrastructure design, deployment, and resource optimization

  • Experience managing hybrid or cross-cloud architectures

DevOps/SRE Practices

  • Advanced competency in CI/CD pipelines (Jenkins, Kubernetes)

  • Infrastructure as Code (Terraform)

  • Observability tools (Prometheus, Grafana)

  • Strong focus on automation-first operations

Data Technologies

  • Proficiency in Data Warehousing platforms (e.g., Vertica, Snowflake)

  • Experience with ETL frameworks and large-scale data processing

  • Understanding of columnar storage systems

BI & Reporting

  • Experience supporting or integrating BI tools (Tableau, Power BI, Oracle Analytics)

Programming & Tools

  • Strong proficiency in Python, Java, or Go

  • Experience with Docker, Kubernetes, and shell scripting

Problem-Solving

  • Strong troubleshooting skills with ability to perform root-cause analysis

  • Experience resolving complex production issues in distributed systems

Responsibilities

Responsibilities

Work with the Site Reliability Engineering (SRE) team to take shared ownership of services and platform components. Develop a strong understanding of end-to-end system architecture, dependencies, and production behavior.

  • Design, build, and operate reliable, scalable, and secure infrastructure supporting large-scale analytics workloads

  • Improve system reliability through automation, monitoring, and performance optimization

  • Contribute to the adoption of AI-assisted approaches for operations, including:

  • Enhancing observability and alerting

  • Supporting automated incident detection and remediation

  • Exploring intelligent automation for infrastructure lifecycle management

  • Partner with development teams to enhance service architecture, scalability, and operability

  • Participate in on-call rotations and act as an escalation point for complex production issues

  • Perform root cause analysis and implement long-term fixes to prevent recurrence

  • Apply knowledge of distributed systems to troubleshoot issues and optimize system performance

  • Drive continuous improvement in DevOps/SRE practices, including CI/CD, Infrastructure as Code, and automation at scale

Develop & Maintain

  • Implement and optimize infrastructure for Oracle HDI Analytics Platform

  • Ensure system uptime, reliability, and scalability

AI-Driven Automation (NEW)

  • Design and implement GenAI-powered or agent-based solutions for:

  • Observability and anomaly detection

  • Incident triage and remediation

  • Infrastructure provisioning and lifecycle management

  • Build tools and frameworks that enable self-service and autonomous operations

Data Pipeline Execution

  • Build and optimize scalable data pipelines using Vertica and ETL frameworks

Operational Excellence

  • Apply DevOps/SRE practices to automate deployments and operations

  • Enhance observability using Prometheus/Grafana and AI-driven insights

Cloud Integration

  • Support multi-cloud initiatives across OCI, AWS, and Azure

  • Optimize cost, performance, and compliance across environments

Incident Response

  • Participate in on-call rotations

  • Implement preventative and automated remediation solutions

Collaboration

  • Work closely with engineers to execute technical roadmaps

  • Contribute to code reviews and infrastructure improvements

What You Bring

  • 10+ years of software engineering experience, with 8+ years in cloud infrastructure, SRE, or DevOps

  • Proven ownership of production system reliability in cloud environments

Core Expertise

  • Cloud infrastructure design and automation

  • Distributed systems and performance optimization

  • Data warehousing and ETL frameworks

AI-Native Experience

  • Demonstrated experience applying GenAI / LLMs / agentic frameworks to infrastructure or operations

  • Experience building or integrating AI-powered automation for DevOps/SRE workflows

  • Familiarity with tools like LangChain, AutoGPT, or custom AI agents

Technical Skills

  • Terraform, Docker, Kubernetes

  • Observability stacks (Prometheus, Grafana)

  • Python, Java, or Go

Additional Strengths

  • Strong problem-solving mindset with a focus on automation and scalability

  • Experience improving system reliability through intelligent automation

Preferred Qualifications

  • Experience in healthcare or regulated environments (HIPAA, compliance frameworks)

  • Familiarity with Oracle HDI or large-scale analytics platforms

  • Experience working in environments requiring security clearance

  • Experience building self-healing or autonomous infrastructure systems

Why Join Oracle HDI?

  • Own and shape AI-native SRE and automation strategy for a mission-critical platform

  • Work on large-scale, data-intensive healthcare systems

  • Be part of Oracle's investment in AI-driven infrastructure and healthcare innovation

  • Build the future of autonomous, self-healing cloud platforms

  • Collaborate with top-tier engineers solving complex, real-world problems

Career Level - IC4

Disclaimer:

Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $86,400 to $199,500 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

  1. Medical, dental, and vision insurance, including expert medical opinion

  2. Short term disability and long term disability

  3. Life insurance and AD&D

  4. Supplemental life insurance (Employee/Spouse/Child)

  5. Health care and dependent care Flexible Spending Accounts

  6. Pre-tax commuter and parking benefits

  7. 401(k) Savings and Investment Plan with company match

  8. Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

  9. 11 paid holidays

  10. Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

  11. Paid parental leave

  12. Adoption assistance

  13. Employee Stock Purchase Plan

  14. Financial planning and group legal

  15. Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - IC4

About Us

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on click.appcast.io or by calling View phone number on click.appcast.io in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Principal AI Site Reliability Engineer (US REMOTE) in United States vacancy
  • $86.4k - $199.5k

     ...contribution to make it a world class engineering center with the focus on...  ...& Analytics Platform. As a Principal Site Reliability Engineer (SRE), you will own...  ...Disclaimer: Certain US customer or client-facing roles...  ...life-saving care. And with AI embedded across our products... 
    Remote work
    Principal
    Temporary work
    Immediate start
    Flexible hours

    Oracle

    United States
    1 day ago
  •  ...Role We're seeking an exceptional Principal Site Reliability Engineer to architect, design, and build our...  ...scalable, and performant. Location: Remote - US based What You Will DoSRE...  ...tools and methodologies Leverage AI and machine learning for predictive... 
    Remote work
    Principal
    Full time

    INFINITE CHOICE LLC

    Dallas, TX
    a month ago
  •  ...We are looking for a Site Reliability Engineer to join our team here at Rustici Software...  ...infrastructure. We are a remote/in-office hybrid company...  ...-call rotation schedule. US based only, direct hire only...  ...most recently, productizing AI to assist our customers in better... 
    Remote work
    Temporary work
    Work at office
    Local area
    Home office
    Flexible hours

    LTG

    United States
    1 day ago
  • $60k

     ...Principal Solutions Engineer (US West Remote) Join to apply for the Principal Solutions Engineer (US West Remote) role at Jobright.ai Principal Solutions Engineer (US West Remote) 2 days ago...  ...Engineer) Sr Software Engineer, Reliability Engineering Greater Seattle... 
    Remote work
    Principal
    Full time

    jobright.com

    Seattle, WA
    3 days ago
  • $195.3k - $270.4k

     ...experimentation, and advanced AI to reshape access to...  ...contribution moves us closer to a world...  .... The Team Upstart’s Site Reliability Engineering (SRE) team owns the...  ...reliability standards. As a Principal Engineer on the SRE...  ...following locations: Remote, San Mateo, Columbus,... 
    Remote work
    Principal
    Temporary work
    Summer work
    Work from home

    Upstart

    Columbus, OH
    2 hours ago
  • $99.6k - $223.4k

     ...re looking for senior engineers with deep Java expertise...  ..., and applied AI to deliver intelligent...  ...solutions. Full-time | US Remote No sponsorship available...  ...Senior Principal Engineer - Cloud, AI &...  ...design for scalability, reliability, and observability... 
    Remote work
    Principal
    Full time
    Temporary work
    Flexible hours

    Oracle

    Raleigh, NC
    2 days ago
  • $157.9k - $282.1k

     ...Principal Full Stack Engineer, AI Platform & Agents Build the GenAI platform...  ...scale. -Location: US/Canada, Hybrid or Remote -Work Hours: Must have...  ...you ship (latency, reliability, hallucination reduction...  ...Infrastructure as Code ~ Site Reliability Engineering... 
    Remote work
    Principal
    Work at office
    2 days per week

    Wolters Kluwer N.V.

    Chicago, IL
    2 days ago
  •  ...Senior Principal Content Strategist At Autodesk, we do what no other...  ...digital experiences and AI-powered innovation will empower...  ...Foundations team for AEC (Architecture, Engineering, and Construction), we are...  ...way. This moment requires us to work in new waysusing AI... 
    Remote work
    Principal

    Autodesk

    United States
    3 days ago
  •  ...focus on what matters. Our remote-first team spans the...  ...started. Come join us for a whale of a ride! Docker’s AI Tools & Security team is...  .... We’re looking for a Principal Backend Engineer who thrives at the intersection...  ...with a strong grasp of reliability engineering principles... 
    Remote work
    Principal
    Temporary work
    Home office

    Docker

    Seattle, WA
    3 hours ago
  • $250k

     ...data challenges. phData is a remote-first global company with employees...  ...-winning workplace in the US, India, and LATAM. Role The Territory...  ...in all practice areas (Data Engineering, Managed Services, Machine...  ...sell transformative data and AI engineering services and solutions... 
    Remote work
    Principal
    Casual work
    Flexible hours

    phData Inc

    New York, NY
    1 day ago
  • $86.4k - $199.5k

     ...we're building the next generation of reliable, AI-powered healthcare infrastructure....  ...critical. We're looking for experienced Site Reliability Engineers who want significant ownership,...  ...excellence. Disclaimer: Certain US customer or client-facing roles may be... 
    Remote work
    Principal
    Temporary work
    Flexible hours

    Oracle

    United States
    2 days ago
  • $163.62k - $212.71k

     ...processes that improve our engineering teams' productivity and streamline...  ...and strategic Lead/Principal Site Reliability Engineer to drive the reliability...  ...the strategic roadmap for AI/ML integration within SRE,...  ...office-based or a fully remote employee. A hybrid work schedule... 
    Remote work
    Principal
    Permanent employment
    Full time
    Part time
    Work experience placement
    Work at office
    Local area
    Immediate start
    Work from home
    Flexible hours
    Shift work
    3 days per week
    1 day per week

    iSpot.tv

    Bellevue, WA
    7 hours ago
  •  ...next. Whether you're engineering advanced materials, transforming...  ...contribution propels us forward.   We don...  ...Overview   The Principal Industrial AI Data Architect is...  ...that enables reliable, scalable AI across industrial...  ...to manufacturing sites and partner locations... 
    Remote work
    Principal
    Worldwide

    Hexion

    United States
    2 days ago
  • $147k - $237.5k

     ...Integrity, and Inclusion. We weave AI into the fabric of everything...  ..., we invite you to join us! We believe collaboration thrives...  ...we hire. We’re looking for a Principal SRE to join our InfoSec SRE...  ...Citizen. BS/MS in Computer Science/Engineering or equivalent training,... 
    Principal
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago
  • $122.1k - $198.3k

     ...only through our careers site by directly applying to...  .... Enhance system reliability and developer productivity...  ...This may include using AI-powered tools to perform...  ...Basic exposure to Chaos Engineering tools like, Gremlin,...  ...to 2 days per week of remote work Tuition Reimbursement... 
    Remote work
    Principal
    Work experience placement
    2 days per week

    The Options Clearing Corporation (OCC)

    Dallas, TX
    1 day ago
  • $173k - $190k

     ...first and accelerated by AI to create meaningful...  ...employee journey. Join us in redefining healthcare...  ...Office expectations For Remote Roles : If this role is...  ...We are seeking a Principal AI Architect who will own...  ...marketing, sales, solution engineering, quoting, contracting,... 
    Remote work
    Principal
    Work at office
    Flexible hours

    PointClickCare

    New York, NY
    1 day ago
  •  ...Our team is seeking a visionary AI leader who has already proven themselves...  ...harder, more meaningful challenge. As Principal AI Agent Engineer within Siemens Energy's AI Factory, you...  ...and orchestration patterns that enable reliable multiagent operation in production... 
    Remote work
    Principal
    Permanent employment
    Local area

    Siemens Energy

    Orlando, FL
    2 days ago
  •  ...An AI-driven career platform seeks a Principal Solutions Engineer for remote work in the US West region. This position emphasizes collaboration with sales and product teams, enhancing business value through technical insights. The ideal candidate possesses robust experience... 
    Remote work
    Principal

    jobright.com

    Seattle, WA
    1 day ago
  •  ...Do you want to shape reliability practices for a new AI inference platform? Are you a...  ...architecture decisions with product engineering teams, and shape SRE...  ...that are helping us create the best workplace...  ...energize and inspire you! #LI-Remote Job Info Job Identification... 
    Remote work
    Permanent employment
    Work at office
    Work from home
    Worldwide
    Flexible hours

    Akamai

    Poland, NY
    3 days ago
  •  ...in the United States is seeking a Principal Engineer to define the technical direction for AI-powered security capabilities. This...  ...in privacy-preserving design. This remote position offers competitive compensation and benefits. Join us in shaping a safer digital future.... 
    Remote work
    Principal

    1Password

    New York, NY
    1 day ago
  • $119.98k - $179.97k

     ...EdSurge is looking for a Software Engineer (AI Systems) to join their team remotely in the US. This full-time position involves hands-on development for AI-powered products and designing innovative systems that support their review process. Candidates should have over... 
    Remote work
    Principal
    Full time

    EdSurge

    New York, NY
    1 day ago
  • $291k

     ...A leading cybersecurity firm is seeking a Principal Developer to shape its technical integration strategy. In this remote role, you'll leverage 15+ years of experience to...  ...including health plans and generous PTO. Join us to lead in the evolving identity and security... 
    Remote work
    Principal

    1Password

    New York, NY
    1 day ago
  • $150k - $170k

     ...Senior Site Reliability Engineer – Zip Co Join to apply for the Senior Site Reliability Engineer...  ...our engineering team. We offer a remote‑first opportunity for US‑based employees with the option to...  .... Experience incorporating AI tools (ChatGPT, Cursor, Codex, GitHub... 
    Remote work
    Casual work
    Work at office
    Flexible hours

    ZIP

    New York, NY
    3 days ago
  • A leading AI lending marketplace is seeking a Principal Software Engineer for its Site Reliability Engineering team. This role involves driving the adoption of SRE principles and enhancing system reliability. Candidates should have over 10 years of experience in Software... 
    Remote job
    Principal
    Flexible hours

    Upstart

    Columbus, OH
    12 hours ago
  • $205k - $285k

     ...focused on digital safety is seeking a Principal Software Engineer to lead innovative projects aimed at...  ...cutting-edge platforms that leverage AI for growth and mentoring fellow...  ...employee benefits, and the position is remote for US-based candidates. #J-18808-Ljbffr... 
    Remote work
    Principal

    Aura Network, Inc

    New York, NY
    1 day ago
  • $91.7k - $163.7k

     ...office in Chicago and remote employees throughout the...  ...your career as you help us create a healthier...  ...a Senior Observability Engineer. The team is responsible...  ...focused on maintaining the reliability, scalability and...  ...expert level in setting up AI rules for tools like DavisAI... 
    Remote work
    Minimum wage
    Full time
    Work experience placement
    Work at office
    Local area

    UnitedHealth Group

    United States
    1 day ago
  •  ...public cloud, data science, AI, engineering innovation, and IoT. Our customers...  ...growing. We are hiring a Site Reliability Engineer Our goal is to...  ...others. Automation for us is a software engineering problem...  ...code. Location: Globally remote role The role We deploy... 
    Remote work
    Work at office
    Local area
    Work from home
    Worldwide

    Canonical

    Atlanta, GA
    7 hours ago
  • $86.9k - $198k

     ...Job Description Remote Work: No Job Number:...  ...Location: Aurora,CO,US Share job via: Share Site Reliability Engineer, Senior The Opportunity:...  ...and prevent fraud. Candidate AI Usage Policy AI is a part... 
    Remote work
    Full time
    Contract work
    Part time
    Work at office
    Local area

    Booz Allen Hamilton

    United States
    2 days ago
  • $99.6k - $223.4k

     ...Oracle is seeking a Senior Principal Engineer – Cloud, AI & Healthcare Platforms (US Remote) to architect and build large-scale cloud-native EHR platforms. Responsibilities include solving complex challenges, designing AI-driven solutions, and mentoring engineers. The... 
    Remote work
    Principal

    Oracle Defunct

    Atlanta, GA
    3 days ago
  •  ...We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive reliability...  .../SLOs and error budgets. AI Agent Enablement : Design...  ...reimbursement of $300/year ~ Remote worker reimbursement of $300...  ..., and the same goes for us here at CertifID. We evolve... 
    Remote work
    Flexible hours
    Night shift

    CertifID LLC

    United States
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal AI Site Reliability Engineer (US REMOTE). Be the first to apply!