Principal AI Site Reliability Engineer (US REMOTE)
$86.4k - $199.5kOracle
Job Description
Join Oracle's Health Data Intelligence (HDI) team as a Software Engineer 4, focused on Site Reliability Engineering for large-scale healthcare analytics platforms. In this role, you will design, build, and operate highly reliable, scalable infrastructure and data pipelines that power mission-critical analytics globally.
You will also contribute to the next evolution of cloud operations by advancing automation, observability, and AI-assisted reliability practices. This includes exploring the use of Generative AI and intelligent automation to improve incident response, system resilience, and operational efficiency.
You will work within a collaborative team to deliver robust solutions that handle massive datasets with precision and performance, while continuously improving system reliability and operational excellence.
U.S. citizenship is required for this position, as the successful candidate will be required to obtain (and maintain) a U.S. government security clearance after hire.
Required Skills
Infrastructure & Reliability
Experience building and operating high-availability, fault-tolerant systems
Strong understanding of distributed systems, performance monitoring, and resiliency patterns
Experience with incident response, root-cause analysis, and production troubleshooting
AI-Native Engineering (NEW)
Hands-on experience applying Generative AI or Agentic AI (e.g., LangChain, AutoGPT, custom agents) to:
Infrastructure lifecycle management
Observability and anomaly detection
Incident response and remediation automation
Ability to design or integrate AI-driven workflows for operational efficiency and reliability
Familiarity with building or integrating autonomous agents for DevOps/SRE use cases
Cloud & Multi-Cloud Ecosystems
Strong experience with multi-cloud environments (OCI, AWS/Azure)
Deep understanding of cloud infrastructure design, deployment, and resource optimization
Experience managing hybrid or cross-cloud architectures
DevOps/SRE Practices
Advanced competency in CI/CD pipelines (Jenkins, Kubernetes)
Infrastructure as Code (Terraform)
Observability tools (Prometheus, Grafana)
Strong focus on automation-first operations
Data Technologies
Proficiency in Data Warehousing platforms (e.g., Vertica, Snowflake)
Experience with ETL frameworks and large-scale data processing
Understanding of columnar storage systems
BI & Reporting
- Experience supporting or integrating BI tools (Tableau, Power BI, Oracle Analytics)
Programming & Tools
Strong proficiency in Python, Java, or Go
Experience with Docker, Kubernetes, and shell scripting
Problem-Solving
Strong troubleshooting skills with ability to perform root-cause analysis
Experience resolving complex production issues in distributed systems
Responsibilities
Responsibilities
Work with the Site Reliability Engineering (SRE) team to take shared ownership of services and platform components. Develop a strong understanding of end-to-end system architecture, dependencies, and production behavior.
Design, build, and operate reliable, scalable, and secure infrastructure supporting large-scale analytics workloads
Improve system reliability through automation, monitoring, and performance optimization
Contribute to the adoption of AI-assisted approaches for operations, including:
Enhancing observability and alerting
Supporting automated incident detection and remediation
Exploring intelligent automation for infrastructure lifecycle management
Partner with development teams to enhance service architecture, scalability, and operability
Participate in on-call rotations and act as an escalation point for complex production issues
Perform root cause analysis and implement long-term fixes to prevent recurrence
Apply knowledge of distributed systems to troubleshoot issues and optimize system performance
Drive continuous improvement in DevOps/SRE practices, including CI/CD, Infrastructure as Code, and automation at scale
Develop & Maintain
Implement and optimize infrastructure for Oracle HDI Analytics Platform
Ensure system uptime, reliability, and scalability
AI-Driven Automation (NEW)
Design and implement GenAI-powered or agent-based solutions for:
Observability and anomaly detection
Incident triage and remediation
Infrastructure provisioning and lifecycle management
Build tools and frameworks that enable self-service and autonomous operations
Data Pipeline Execution
- Build and optimize scalable data pipelines using Vertica and ETL frameworks
Operational Excellence
Apply DevOps/SRE practices to automate deployments and operations
Enhance observability using Prometheus/Grafana and AI-driven insights
Cloud Integration
Support multi-cloud initiatives across OCI, AWS, and Azure
Optimize cost, performance, and compliance across environments
Incident Response
Participate in on-call rotations
Implement preventative and automated remediation solutions
Collaboration
Work closely with engineers to execute technical roadmaps
Contribute to code reviews and infrastructure improvements
What You Bring
10+ years of software engineering experience, with 8+ years in cloud infrastructure, SRE, or DevOps
Proven ownership of production system reliability in cloud environments
Core Expertise
Cloud infrastructure design and automation
Distributed systems and performance optimization
Data warehousing and ETL frameworks
AI-Native Experience
Demonstrated experience applying GenAI / LLMs / agentic frameworks to infrastructure or operations
Experience building or integrating AI-powered automation for DevOps/SRE workflows
Familiarity with tools like LangChain, AutoGPT, or custom AI agents
Technical Skills
Terraform, Docker, Kubernetes
Observability stacks (Prometheus, Grafana)
Python, Java, or Go
Additional Strengths
Strong problem-solving mindset with a focus on automation and scalability
Experience improving system reliability through intelligent automation
Preferred Qualifications
Experience in healthcare or regulated environments (HIPAA, compliance frameworks)
Familiarity with Oracle HDI or large-scale analytics platforms
Experience working in environments requiring security clearance
Experience building self-healing or autonomous infrastructure systems
Why Join Oracle HDI?
Own and shape AI-native SRE and automation strategy for a mission-critical platform
Work on large-scale, data-intensive healthcare systems
Be part of Oracle's investment in AI-driven infrastructure and healthcare innovation
Build the future of autonomous, self-healing cloud platforms
Collaborate with top-tier engineers solving complex, real-world problems
Career Level - IC4
Disclaimer:
Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
Range and benefit information provided in this posting are specific to the stated locations only
US: Hiring Range in USD from: $86,400 to $199,500 per annum. May be eligible for bonus and equity.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.
Career Level - IC4
About Us
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing View email address on click.appcast.io or by calling View phone number on click.appcast.io in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
$86.4k - $199.5k
...contribution to make it a world class engineering center with the focus on... ...& Analytics Platform. As a Principal Site Reliability Engineer (SRE), you will own... ...Disclaimer: Certain US customer or client-facing roles... ...life-saving care. And with AI embedded across our products...Remote workPrincipalTemporary workImmediate startFlexible hours- ...Role We're seeking an exceptional Principal Site Reliability Engineer to architect, design, and build our... ...scalable, and performant. Location: Remote - US based What You Will DoSRE... ...tools and methodologies Leverage AI and machine learning for predictive...Remote workPrincipalFull time
- ...We are looking for a Site Reliability Engineer to join our team here at Rustici Software... ...infrastructure. We are a remote/in-office hybrid company... ...-call rotation schedule. US based only, direct hire only... ...most recently, productizing AI to assist our customers in better...Remote workTemporary workWork at officeLocal areaHome officeFlexible hours
$60k
...Principal Solutions Engineer (US West Remote) Join to apply for the Principal Solutions Engineer (US West Remote) role at Jobright.ai Principal Solutions Engineer (US West Remote) 2 days ago... ...Engineer) Sr Software Engineer, Reliability Engineering Greater Seattle...Remote workPrincipalFull time$195.3k - $270.4k
...experimentation, and advanced AI to reshape access to... ...contribution moves us closer to a world... .... The Team Upstart’s Site Reliability Engineering (SRE) team owns the... ...reliability standards. As a Principal Engineer on the SRE... ...following locations: Remote, San Mateo, Columbus,...Remote workPrincipalTemporary workSummer workWork from home$99.6k - $223.4k
...re looking for senior engineers with deep Java expertise... ..., and applied AI to deliver intelligent... ...solutions. Full-time | US Remote No sponsorship available... ...Senior Principal Engineer - Cloud, AI &... ...design for scalability, reliability, and observability...Remote workPrincipalFull timeTemporary workFlexible hours$157.9k - $282.1k
...Principal Full Stack Engineer, AI Platform & Agents Build the GenAI platform... ...scale. -Location: US/Canada, Hybrid or Remote -Work Hours: Must have... ...you ship (latency, reliability, hallucination reduction... ...Infrastructure as Code ~ Site Reliability Engineering...Remote workPrincipalWork at office2 days per week- ...Senior Principal Content Strategist At Autodesk, we do what no other... ...digital experiences and AI-powered innovation will empower... ...Foundations team for AEC (Architecture, Engineering, and Construction), we are... ...way. This moment requires us to work in new waysusing AI...Remote workPrincipal
- ...focus on what matters. Our remote-first team spans the... ...started. Come join us for a whale of a ride! Docker’s AI Tools & Security team is... .... We’re looking for a Principal Backend Engineer who thrives at the intersection... ...with a strong grasp of reliability engineering principles...Remote workPrincipalTemporary workHome office
$250k
...data challenges. phData is a remote-first global company with employees... ...-winning workplace in the US, India, and LATAM. Role The Territory... ...in all practice areas (Data Engineering, Managed Services, Machine... ...sell transformative data and AI engineering services and solutions...Remote workPrincipalCasual workFlexible hours$86.4k - $199.5k
...we're building the next generation of reliable, AI-powered healthcare infrastructure.... ...critical. We're looking for experienced Site Reliability Engineers who want significant ownership,... ...excellence. Disclaimer: Certain US customer or client-facing roles may be...Remote workPrincipalTemporary workFlexible hours$163.62k - $212.71k
...processes that improve our engineering teams' productivity and streamline... ...and strategic Lead/Principal Site Reliability Engineer to drive the reliability... ...the strategic roadmap for AI/ML integration within SRE,... ...office-based or a fully remote employee. A hybrid work schedule...Remote workPrincipalPermanent employmentFull timePart timeWork experience placementWork at officeLocal areaImmediate startWork from homeFlexible hoursShift work3 days per week1 day per week- ...next. Whether you're engineering advanced materials, transforming... ...contribution propels us forward. We don... ...Overview The Principal Industrial AI Data Architect is... ...that enables reliable, scalable AI across industrial... ...to manufacturing sites and partner locations...Remote workPrincipalWorldwide
$147k - $237.5k
...Integrity, and Inclusion. We weave AI into the fabric of everything... ..., we invite you to join us! We believe collaboration thrives... ...we hire. We’re looking for a Principal SRE to join our InfoSec SRE... ...Citizen. BS/MS in Computer Science/Engineering or equivalent training,...PrincipalFull timeWork at officeVisa sponsorshipWork visa$122.1k - $198.3k
...only through our careers site by directly applying to... .... Enhance system reliability and developer productivity... ...This may include using AI-powered tools to perform... ...Basic exposure to Chaos Engineering tools like, Gremlin,... ...to 2 days per week of remote work Tuition Reimbursement...Remote workPrincipalWork experience placement2 days per week$173k - $190k
...first and accelerated by AI to create meaningful... ...employee journey. Join us in redefining healthcare... ...Office expectations For Remote Roles : If this role is... ...We are seeking a Principal AI Architect who will own... ...marketing, sales, solution engineering, quoting, contracting,...Remote workPrincipalWork at officeFlexible hours- ...Our team is seeking a visionary AI leader who has already proven themselves... ...harder, more meaningful challenge. As Principal AI Agent Engineer within Siemens Energy's AI Factory, you... ...and orchestration patterns that enable reliable multiagent operation in production...Remote workPrincipalPermanent employmentLocal area
- ...An AI-driven career platform seeks a Principal Solutions Engineer for remote work in the US West region. This position emphasizes collaboration with sales and product teams, enhancing business value through technical insights. The ideal candidate possesses robust experience...Remote workPrincipal
- ...Do you want to shape reliability practices for a new AI inference platform? Are you a... ...architecture decisions with product engineering teams, and shape SRE... ...that are helping us create the best workplace... ...energize and inspire you! #LI-Remote Job Info Job Identification...Remote workPermanent employmentWork at officeWork from homeWorldwideFlexible hours
- ...in the United States is seeking a Principal Engineer to define the technical direction for AI-powered security capabilities. This... ...in privacy-preserving design. This remote position offers competitive compensation and benefits. Join us in shaping a safer digital future....Remote workPrincipal
$119.98k - $179.97k
...EdSurge is looking for a Software Engineer (AI Systems) to join their team remotely in the US. This full-time position involves hands-on development for AI-powered products and designing innovative systems that support their review process. Candidates should have over...Remote workPrincipalFull time$291k
...A leading cybersecurity firm is seeking a Principal Developer to shape its technical integration strategy. In this remote role, you'll leverage 15+ years of experience to... ...including health plans and generous PTO. Join us to lead in the evolving identity and security...Remote workPrincipal$150k - $170k
...Senior Site Reliability Engineer – Zip Co Join to apply for the Senior Site Reliability Engineer... ...our engineering team. We offer a remote‑first opportunity for US‑based employees with the option to... .... Experience incorporating AI tools (ChatGPT, Cursor, Codex, GitHub...Remote workCasual workWork at officeFlexible hours- A leading AI lending marketplace is seeking a Principal Software Engineer for its Site Reliability Engineering team. This role involves driving the adoption of SRE principles and enhancing system reliability. Candidates should have over 10 years of experience in Software...Remote jobPrincipalFlexible hours
$205k - $285k
...focused on digital safety is seeking a Principal Software Engineer to lead innovative projects aimed at... ...cutting-edge platforms that leverage AI for growth and mentoring fellow... ...employee benefits, and the position is remote for US-based candidates. #J-18808-Ljbffr...Remote workPrincipal$91.7k - $163.7k
...office in Chicago and remote employees throughout the... ...your career as you help us create a healthier... ...a Senior Observability Engineer. The team is responsible... ...focused on maintaining the reliability, scalability and... ...expert level in setting up AI rules for tools like DavisAI...Remote workMinimum wageFull timeWork experience placementWork at officeLocal area- ...public cloud, data science, AI, engineering innovation, and IoT. Our customers... ...growing. We are hiring a Site Reliability Engineer Our goal is to... ...others. Automation for us is a software engineering problem... ...code. Location: Globally remote role The role We deploy...Remote workWork at officeLocal areaWork from homeWorldwide
$86.9k - $198k
...Job Description Remote Work: No Job Number:... ...Location: Aurora,CO,US Share job via: Share Site Reliability Engineer, Senior The Opportunity:... ...and prevent fraud. Candidate AI Usage Policy AI is a part...Remote workFull timeContract workPart timeWork at officeLocal area$99.6k - $223.4k
...Oracle is seeking a Senior Principal Engineer – Cloud, AI & Healthcare Platforms (US Remote) to architect and build large-scale cloud-native EHR platforms. Responsibilities include solving complex challenges, designing AI-driven solutions, and mentoring engineers. The...Remote workPrincipal- ...We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive reliability... .../SLOs and error budgets. AI Agent Enablement : Design... ...reimbursement of $300/year ~ Remote worker reimbursement of $300... ..., and the same goes for us here at CertifID. We evolve...Remote workFlexible hoursNight shift
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal AI Site Reliability Engineer (US REMOTE). Be the first to apply!
- director data engineering United States
- principal quality engineer United States
- technical director engineering United States
- optical engineer project manager United States
- process engineer project manager United States
- associate director engineering United States
- senior civil engineer project manager United States
- principal cloud engineer United States
- general manager engineering United States
- director of product engineering United States


