Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Infrastructure Operations Engineer

$120k - $140k

Private Health Management

AI Infrastructure Operations Engineer

Remote

AI Infrastructure & Operations Engineer

Location: Remote (U.S.) Reports To: Juan Sandoval-Tobias

About Private Health Management

Private Health Management (PHM) supports people with serious and complex medical conditions, helping them obtain the best possible medical care. We guide individuals and families to top specialists, advanced diagnostics, and personalized care. Trusted by healthcare providers and businesses, PHM offers independent, science-backed insights to help clients make informed decisions and access the best care.

About the Role

PHM is building and scaling Companion, an AI-enabled clinical platform operating in a high-trust healthcare environment where reliability, observability, and security are foundational requirements. The platform includes headless AI agents designed to support clinical and operational professionals by acting as intelligent workstations that integrate with enterprise applications and workflows.

The AI Infrastructure & Operations Engineer will operationalize the platform so it runs reliably at production scale, helping ensure the systems behind Companion are observable, recoverable, secure, and maintainable as adoption grows.

This role sits at the intersection of Kubernetes operations, AI platform reliability, observability engineering, and operational security. You will help evolve and maintain the Azure-based infrastructure stack while partnering closely with technology leadership, AI architects, and security stakeholders. This is a high-ownership role for someone who thrives in fast-moving environments, is comfortable operating with incomplete information, and enjoys building operational discipline around emerging AI systems.

What You'll Accomplish
  • Establish operational reliability for Companion across AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines.
  • Build meaningful observability practices that help PHM understand platform behavior, usage trends, and operational risks before they become incidents.
  • Create sustainable operational hygiene around patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles.
  • Strengthen platform resilience, documentation, and operational processes so the environment can scale without relying on tribal knowledge.
How You'll Spend Your Days

Operate and Improve Platform Reliability

  • Monitor and maintain AKS infrastructure, AI agent workloads, deployment pipelines, and support Azure services.
  • Investigate incidents, troubleshoot production issues, and improve platform resilience through better operational patterns and tooling.
  • Support release operations and help ensure deployments remain stable, observable, and recoverable.

Build Observability and Operational Insight

  • Develop dashboards, alerts, logging patterns, and operational baselines using Azure Log Analytics and Application Insights.
  • Identify system trends, performance bottlenecks, and emerging operational risks across infrastructure and AI workloads.
  • Improve visibility into AI agent behavior, enterprise workflow integrations, latency patterns, and system health under real user load.

Strengthen Security and Operational Hygiene

  • Maintain operational cadence for dependency updates, CVE remediation, image signing, secrets rotation, and cluster patching.
  • Support security-first infrastructure practices across Kubernetes, CI/CD pipelines, and Azure environments.
  • Partner with security and engineering stakeholders to maintain compliance-aware operational practices in a HIPAA-regulated environment.

Collaborate Across a Small, High-Ownership Team

  • Work closely with technology leadership, platform engineers, security stakeholders, and AI architects to evolve the operational maturity of Companion.
  • Contribute documentation, operational runbooks, and shared knowledge that reduce platform fragility over time.
  • Help establish practical operational patterns for AI systems where industry best practices are still emerging.
What You Bring to the Table

Required

  • Strong hands-on Kubernetes operations experience, including troubleshooting workloads, admission controllers, cluster networking, and production incidents.
  • Experience supporting cloud-native infrastructure in Azure environments, particularly AKS and related operational tooling.
  • Demonstrated strength in monitoring, observability, and incident response using structured logging and metrics platforms.
  • SRE mindset with experience handling on-call responsibilities, operational prioritization, and post-incident analysis.
  • Comfort operating in fast-moving environments with incomplete documentation, evolving processes, and broad ownership areas.
  • Strong communication and collaboration skills with the ability to explain technical issues clearly across technical and non-technical audiences.

Nice to Have

  • Experience with CI/CD pipeline tooling including GitHub Actions, Kaniko, cosign, image signing, or Actions Runner Controller.
  • Familiarity with Infrastructure as Code practices using Bicep or Azure resource automation tooling.
  • Exposure to HIPAA, SOC2, or other compliance-aware operational environments.
  • Experience supporting AI or LLM-backed applications in production environments.
Compensation

The target base salary for this position is $120000 - $140000. This base salary is only a part of a total compensation package that also includes health/dental/vision benefits, annual cash incentive program, 401k with match, flexible PTO, PHM for PHM — our services for you and your dependents — and other benefits. Individual pay may vary from the target range as several factors including market forces, experience, location, disparities in market data, and other relevant business considerations may all factor into final compensation.

This is a remote role requiring that you live in and physically perform all work in the United States.

Next Steps

Private Health Management is a remote company with employees around the United States. We're committed to providing a thoughtful, transparent interview experience and meaningful opportunities to get to know our company, mission, and wonderful teammates through fully remote interviews.

If your application is selected for interviews, you'll hear from a member of our recruiting team to schedule next steps. Interviews will also include the hiring manager, peers, and often an executive from the department.

PHM uses AI-enabled tools at certain points in the recruiting process to help identify and evaluate top talent; however, all hiring decisions are made by human reviewers.

Have a quick question about the role? Email View email address on click.appcast.io or simply apply here.

Anticipated Pay Range

$120,000 - $140,000 USD

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the AI Infrastructure Operations Engineer in United States vacancy
  • $70 per hour

     ...technical talent with leading AI research labs. Headquartered...  .... Position: FTE: Network Engineer (with Programming) – Data & AI...  ...support machine learning operations. Collaborate closely with...  .... Curiosity about how raw infrastructure data becomes machine learning... 
    Suggested
    Full time
    Contract work
    Summer work

    Mercor

    New York, NY
    3 days ago
  • $70 per hour

     ...creative and technical talent with leading AI research labs. Headquartered in San...  ...and Jack Dorsey . Position: Network Engineer - Data for Autonomous Systems annotation...  ...series metrics. Curiosity about how raw infrastructure data becomes machine learning input.... 
    Suggested
    Contract work
    Summer work
    Remote work

    Mercor

    San Francisco, CA
    16 days ago
  • $90k - $110k

     ...CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave...  ..., CoreWeave combines superior infrastructure performance with deep technical expertise...  ...seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC... 
    Suggested
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    24 days ago
  • $160k - $200k

     ...Infrastructure Operations Engineer New York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States Who We Are Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end... 
    Suggested
    Remote work
    Work from home
    Flexible hours

    Lightning AI

    United States
    4 days ago
  •  ...Responsibilities Provide daily operational support for CMDB processes covering infrastructure, cloud, and enterprise platforms...  ...of new technologies and AI-enabled capabilities from an operational...  ...and partnering with senior engineers as needed. Collaborate with... 
    Suggested
    Work experience placement

    Vanguard Group, Inc.

    Wayne, PA
    1 day ago
  • $74.5k - $122k

     ...network across Northeast Ohio. As a Network Operations Engineer III in the Cleveland Zone, you will be...  ..., small cells, HUBs, and supporting infrastructure across the Greater Cleveland area and...  .... Bringing new ideas, including AI-assisted workflows, to improve how the... 
    Full time
    Temporary work
    Part time
    For contractors
    Work experience placement
    Shift work
    Night shift
    Rotating shift

    Verizon

    Twinsburg, OH
    2 days ago
  • $77.6k - $176k

     ...Network Operations Engineer The Opportunity: A well-maintained network is critical to enabling...  ...and upgrade of enterprise-wide infrastructure, help troubleshoot and resolve complex...  ...identity and prevent fraud. Candidate AI Usage Policy AI is a part of our... 
    Full time
    Contract work
    Part time
    Work at office
    Local area
    Remote work

    BOOZ, ALLEN & HAMILTON, INC.

    Keyport, WA
    19 hours ago
  • $91.7k - $163.7k

     ...classification. The team will work closely with infrastructure, network, security, and downstream...  ...from initial rollout to stable BAU operations. You’ll enjoy the flexibility to work remotely...  ...Leverage enterprise‑approved AI tools to streamline workflows, automate... 
    Minimum wage
    Full time
    Work experience placement
    Work at office
    Local area
    Remote work

    UnitedHealth Group

    Eden Prairie, MN
    4 days ago
  • $180k - $225k

     ...days per week    Extreme’s Cloud Operations team is a group of talented engineers passionate about building highly...  ...operation, as well as cloud infrastructure design and implementation. Together...  ...and best practices and leverages AI and cloud service provider platforms... 
    Work experience placement
    Work at office
    Local area
    2 days per week
    1 day per week

    Extreme Networks

    San Jose, CA
    3 days ago
  •  ...Incedo: Incedo is a global AI and data transformation...  ...for strategy to execution, we operate at the intersection of business...  ...foundation of AI & Data, digital engineering, and operations...  ...engineering initiatives, automating infrastructure, and ensuring high-availability... 
    Worldwide

    Qode

    San Jose, CA
    7 days ago
  •  ...secure, reliable, and resilient AI compute at scale. We've built...  ...platform that eliminates infrastructure barriers, empowering builders...  ...Role The Infrastructure Engineer – DevOps, Kubernetes & Automation...  ...and Kubernetes platform operations. This role will work across... 
    Temporary work
    Work at office
    Flexible hours

    TensorWave

    Las Vegas, NV
    11 days ago
  •  ...delivers advanced automation, AI integrations, global reach,...  ...We are seeking a Senior VoIP Engineer with a modern engineering mindset...  ...to join Bandwidth's Network Operations team. While you possess deep...  ...grade environments, you view infrastructure through the lens of Software... 

    Bandwidth

    Raleigh, NC
    24 days ago
  • $95k - $115k

     ...Lockheed Martin, GE Aerospace, NASA, JPL, Northrop Grumman, and Boeing. About the Opportunity This on-site Cloud Infrastructure & AI Operations Engineer designs, implements, and manages secure, scalable cloud infrastructure that supports advanced manufacturing and... 
    Permanent employment
    Full time
    Local area
    Remote work
    Relocation
    Flexible hours

    Pioneer Circuits Inc.

    Santa Ana, CA
    9 days ago
  •  ...xAI's mission is to create AI systems that can accurately understand...  ...motivated, and focused on engineering excellence. This organization...  ...and thrive on curiosity. We operate with a flat organizational...  ...ROLE: As a member of the xAI infrastructure team, you will apply your... 
    Internship
    Work at office
    Weekend work

    xAI

    Southaven, MS
    6 days ago
  • $102.4k - $153.2k

     ...Senior Cloud Operations Engineer Job Category: Information Technology Location: US - Massachusetts...  ...new environments and upgrade of the infrastructure components and product application...  ...recruitment stage. Job ID: 23548 AI in Action - Responsible Use of AI in... 
    Remote work
    Monday to Friday
    Flexible hours
    Shift work

    Pegasystems

    Watertown, MA
    2 days ago
  •  ...Cloud Operations Engineer Neo4j is the graph intelligence platform that transforms data into...  ...generation of intelligent applications and AI systems. It includes enterprise-ready...  ..., to maximize the application and infrastructure uptime. This role is instrumental in... 
    Work experience placement
    Casual work
    Work at office
    Remote work

    Neo4j

    United States
    4 days ago
  •  ...Cloud Operations Engineer Our personalization platform is strategically leveraged by 250 global...  ...will be familiar with cloud based infrastructure and platform services across a host of...  ...software. We are an enterprise grade AI platform that operates at web scale. You... 
    Work at office

    Roberts Recruiting

    Boston, MA
    4 days ago
  •  ...Senior Cloud Operations Engineer for Stellus Rx We're opening eyes, hearts and minds to the impact that a pharmacy...  ...role is built for a cloud engineer who uses AI as a core part of how they operate — automating infrastructure management, accelerating troubleshooting, and... 
    Remote work

    Stellus Rx

    United States
    4 days ago
  •  ...Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides...  ...We are seeking a highly skilled and experienced AI Infrastructure Operations Engineer to manage and operate our cutting-edge machine learning compute... 

    CEREBRAS SYSTEMS INC.

    Sunnyvale, CA
    2 days ago
  • $93.9k - $159.6k

     ...Cloud Operations Engineer nCino offers exciting career opportunities for individuals who want...  ...craft automated, efficient, and scalable infrastructure that enable us to rapidly produce and...  ...code review process Leverage AI tools and techniques to enhance software... 
    Local area
    Worldwide

    nCino

    Wilmington, NC
    4 days ago
  •  ...Senior Cloud Operations Engineer, Databases Job Category: Engineering & Cloud Location: Poland - Remote Meet Our Team: Pega Poland...  ...this role at the relevant recruitment stage. Job ID: 23566 AI in Action – Responsible Use of AI in Recruitment Pega... 
    Permanent employment
    Work at office
    Remote work
    Flexible hours
    Rotating shift

    Pegasystems

    United States
    1 day ago
  •  ...divh2Cloud Network Operations Engineer III/h2pIridium is seeking a Cloud Network Operations Engineer...  ...highly available and secure messaging infrastructure critical to production services. This...  ...Advanced Networking, AWS Certified AI Practitioner, etc.)/liliExperience with... 
    Contract work
    For subcontractor
    Work at office
    Remote work
    3 days per week

    Iridium Satellite Communications

    Tempe, AZ
    1 day ago
  •  ...on matters related to day-to-day cloud operations and implementation within Children's Healthcare...  ...experience ~ Experience with AI tools ~5 years of experience in a technical...  ...position ~ Experience managing cloud infrastructure in an enterprise environment ~... 
    Work experience placement
    Local area
    Monday to Friday
    Shift work

    Children's Healthcare of Atlanta

    Brookhaven, GA
    19 hours ago
  •  ...functionally with Information Security Operations and Infrastructure/DevOps teams, to administer and...  ...security configurations for Kubernetes Engine environments, including: Cluster and...  ...secure implementations/integrations of AI within cloud infrastructure, including... 
    Remote work
    Flexible hours

    HealthX Ventures

    United States
    20 hours ago
  • $140k - $185k

     ...Principal Cloud Engineering and Production Operations Engineer The Principal Cloud and Production Operations...  .... This role combines deep cloud infrastructure expertise with strong production...  ...access security model Exposure to AI/ML infrastructure or data-driven... 
    For subcontractor
    Local area

    A10 Networks

    San Jose, CA
    19 hours ago
  • $185k - $200k

     ...Staff Cloud Operations Engineer Remote, US Branch is on a mission to empower workers with...  ...automation (crons, n8n, Airflow) that bridges infrastructure and business processes. Comfortable...  ...is a plus. Comfortable leveraging AI tools to accelerate work. The company... 
    Daily paid
    Remote work
    Home office
    Flexible hours

    Branch

    United States
    19 hours ago
  • $77.6k - $176k

    Network Operations Engineer The Opportunity: Monitor network administration and maintenance operations...  ...with secure network systems and infrastructure management Nice If You Have: Experience...  ...identity and prevent fraud. Candidate AI Usage Policy AI is a part of our daily... 
    Full time
    Contract work
    Part time
    Work at office
    Local area
    Remote work

    Booz Allen Hamilton

    Bremerton, WA
    2 days ago
  • $1,000 per month

     ...We're hiring a Azure Cloud Operations Engineer to join our team in Atlanta, GA . This role...  ...with the following: ~ Managing cloud infrastructure in an enterprise environment ~ Azure...  ...and working knowledge of AI Foundry, including model deployments,... 
    Work experience placement
    Remote work

    TM Floyd and Company

    United States
    19 hours ago
  • $107.9k - $195.05k

    # Senior Cloud Operations EngineerLeidosFull TimeseniorGaithersburg,...  ...experienced Senior Cloud Operations Engineer to support the delivery,...  ..., analytics, and emerging AI technologies. Ideal candidates...  ...maintain, and optimize cloud infrastructure environments (e.g., AWS, Azure... 
    Local area
    Immediate start

    TryApplyNow

    Gaithersburg, MD
    4 days ago
  • $165k - $210k

     ...builds modern analytics and AI solutions that turn complex...  ...stay at the forefront of data engineering and AI advancements. Remote...  ...will maintain the customer infrastructure once the platform has been built...  ...in infrastructure and operations (managing enterprise data platforms... 
    Casual work
    Remote work

    Jobot

    New York, NY
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Infrastructure Operations Engineer. Be the first to apply!