AI Infrastructure Operations Engineer
$120k - $140kPrivate Health Management
AI Infrastructure Operations Engineer
Remote
AI Infrastructure & Operations Engineer
Location: Remote (U.S.) Reports To: Juan Sandoval-Tobias
About Private Health Management
Private Health Management (PHM) supports people with serious and complex medical conditions, helping them obtain the best possible medical care. We guide individuals and families to top specialists, advanced diagnostics, and personalized care. Trusted by healthcare providers and businesses, PHM offers independent, science-backed insights to help clients make informed decisions and access the best care.
About the Role
PHM is building and scaling Companion, an AI-enabled clinical platform operating in a high-trust healthcare environment where reliability, observability, and security are foundational requirements. The platform includes headless AI agents designed to support clinical and operational professionals by acting as intelligent workstations that integrate with enterprise applications and workflows.
The AI Infrastructure & Operations Engineer will operationalize the platform so it runs reliably at production scale, helping ensure the systems behind Companion are observable, recoverable, secure, and maintainable as adoption grows.
This role sits at the intersection of Kubernetes operations, AI platform reliability, observability engineering, and operational security. You will help evolve and maintain the Azure-based infrastructure stack while partnering closely with technology leadership, AI architects, and security stakeholders. This is a high-ownership role for someone who thrives in fast-moving environments, is comfortable operating with incomplete information, and enjoys building operational discipline around emerging AI systems.
What You'll Accomplish
- Establish operational reliability for Companion across AKS infrastructure, AI agent workloads, monitoring systems, and deployment pipelines.
- Build meaningful observability practices that help PHM understand platform behavior, usage trends, and operational risks before they become incidents.
- Create sustainable operational hygiene around patching, CVE remediation, secrets rotation, dependency management, and cloud maintenance cycles.
- Strengthen platform resilience, documentation, and operational processes so the environment can scale without relying on tribal knowledge.
How You'll Spend Your Days
Operate and Improve Platform Reliability
- Monitor and maintain AKS infrastructure, AI agent workloads, deployment pipelines, and support Azure services.
- Investigate incidents, troubleshoot production issues, and improve platform resilience through better operational patterns and tooling.
- Support release operations and help ensure deployments remain stable, observable, and recoverable.
Build Observability and Operational Insight
- Develop dashboards, alerts, logging patterns, and operational baselines using Azure Log Analytics and Application Insights.
- Identify system trends, performance bottlenecks, and emerging operational risks across infrastructure and AI workloads.
- Improve visibility into AI agent behavior, enterprise workflow integrations, latency patterns, and system health under real user load.
Strengthen Security and Operational Hygiene
- Maintain operational cadence for dependency updates, CVE remediation, image signing, secrets rotation, and cluster patching.
- Support security-first infrastructure practices across Kubernetes, CI/CD pipelines, and Azure environments.
- Partner with security and engineering stakeholders to maintain compliance-aware operational practices in a HIPAA-regulated environment.
Collaborate Across a Small, High-Ownership Team
- Work closely with technology leadership, platform engineers, security stakeholders, and AI architects to evolve the operational maturity of Companion.
- Contribute documentation, operational runbooks, and shared knowledge that reduce platform fragility over time.
- Help establish practical operational patterns for AI systems where industry best practices are still emerging.
What You Bring to the Table
Required
- Strong hands-on Kubernetes operations experience, including troubleshooting workloads, admission controllers, cluster networking, and production incidents.
- Experience supporting cloud-native infrastructure in Azure environments, particularly AKS and related operational tooling.
- Demonstrated strength in monitoring, observability, and incident response using structured logging and metrics platforms.
- SRE mindset with experience handling on-call responsibilities, operational prioritization, and post-incident analysis.
- Comfort operating in fast-moving environments with incomplete documentation, evolving processes, and broad ownership areas.
- Strong communication and collaboration skills with the ability to explain technical issues clearly across technical and non-technical audiences.
Nice to Have
- Experience with CI/CD pipeline tooling including GitHub Actions, Kaniko, cosign, image signing, or Actions Runner Controller.
- Familiarity with Infrastructure as Code practices using Bicep or Azure resource automation tooling.
- Exposure to HIPAA, SOC2, or other compliance-aware operational environments.
- Experience supporting AI or LLM-backed applications in production environments.
Compensation
The target base salary for this position is $120000 - $140000. This base salary is only a part of a total compensation package that also includes health/dental/vision benefits, annual cash incentive program, 401k with match, flexible PTO, PHM for PHM — our services for you and your dependents — and other benefits. Individual pay may vary from the target range as several factors including market forces, experience, location, disparities in market data, and other relevant business considerations may all factor into final compensation.
This is a remote role requiring that you live in and physically perform all work in the United States.
Next Steps
Private Health Management is a remote company with employees around the United States. We're committed to providing a thoughtful, transparent interview experience and meaningful opportunities to get to know our company, mission, and wonderful teammates through fully remote interviews.
If your application is selected for interviews, you'll hear from a member of our recruiting team to schedule next steps. Interviews will also include the hiring manager, peers, and often an executive from the department.
PHM uses AI-enabled tools at certain points in the recruiting process to help identify and evaluate top talent; however, all hiring decisions are made by human reviewers.
Have a quick question about the role? Email View email address on click.appcast.io or simply apply here.
Anticipated Pay Range
$120,000 - $140,000 USD
$70 per hour
...technical talent with leading AI research labs. Headquartered... .... Position: FTE: Network Engineer (with Programming) – Data & AI... ...support machine learning operations. Collaborate closely with... .... Curiosity about how raw infrastructure data becomes machine learning...SuggestedFull timeContract workSummer work$70 per hour
...creative and technical talent with leading AI research labs. Headquartered in San... ...and Jack Dorsey . Position: Network Engineer - Data for Autonomous Systems annotation... ...series metrics. Curiosity about how raw infrastructure data becomes machine learning input....SuggestedContract workSummer workRemote work$90k - $110k
...CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave... ..., CoreWeave combines superior infrastructure performance with deep technical expertise... ...seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC...SuggestedPermanent employmentTemporary workCasual workWork at officeFlexible hours$160k - $200k
...Infrastructure Operations Engineer New York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States Who We Are Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end...SuggestedRemote workWork from homeFlexible hours- ...Responsibilities Provide daily operational support for CMDB processes covering infrastructure, cloud, and enterprise platforms... ...of new technologies and AI-enabled capabilities from an operational... ...and partnering with senior engineers as needed. Collaborate with...SuggestedWork experience placement
$74.5k - $122k
...network across Northeast Ohio. As a Network Operations Engineer III in the Cleveland Zone, you will be... ..., small cells, HUBs, and supporting infrastructure across the Greater Cleveland area and... .... Bringing new ideas, including AI-assisted workflows, to improve how the...Full timeTemporary workPart timeFor contractorsWork experience placementShift workNight shiftRotating shift$77.6k - $176k
...Network Operations Engineer The Opportunity: A well-maintained network is critical to enabling... ...and upgrade of enterprise-wide infrastructure, help troubleshoot and resolve complex... ...identity and prevent fraud. Candidate AI Usage Policy AI is a part of our...Full timeContract workPart timeWork at officeLocal areaRemote work$91.7k - $163.7k
...classification. The team will work closely with infrastructure, network, security, and downstream... ...from initial rollout to stable BAU operations. You’ll enjoy the flexibility to work remotely... ...Leverage enterprise‑approved AI tools to streamline workflows, automate...Minimum wageFull timeWork experience placementWork at officeLocal areaRemote work$180k - $225k
...days per week Extreme’s Cloud Operations team is a group of talented engineers passionate about building highly... ...operation, as well as cloud infrastructure design and implementation. Together... ...and best practices and leverages AI and cloud service provider platforms...Work experience placementWork at officeLocal area2 days per week1 day per week- ...Incedo: Incedo is a global AI and data transformation... ...for strategy to execution, we operate at the intersection of business... ...foundation of AI & Data, digital engineering, and operations... ...engineering initiatives, automating infrastructure, and ensuring high-availability...Worldwide
- ...secure, reliable, and resilient AI compute at scale. We've built... ...platform that eliminates infrastructure barriers, empowering builders... ...Role The Infrastructure Engineer – DevOps, Kubernetes & Automation... ...and Kubernetes platform operations. This role will work across...Temporary workWork at officeFlexible hours
- ...delivers advanced automation, AI integrations, global reach,... ...We are seeking a Senior VoIP Engineer with a modern engineering mindset... ...to join Bandwidth's Network Operations team. While you possess deep... ...grade environments, you view infrastructure through the lens of Software...
$95k - $115k
...Lockheed Martin, GE Aerospace, NASA, JPL, Northrop Grumman, and Boeing. About the Opportunity This on-site Cloud Infrastructure & AI Operations Engineer designs, implements, and manages secure, scalable cloud infrastructure that supports advanced manufacturing and...Permanent employmentFull timeLocal areaRemote workRelocationFlexible hours- ...xAI's mission is to create AI systems that can accurately understand... ...motivated, and focused on engineering excellence. This organization... ...and thrive on curiosity. We operate with a flat organizational... ...ROLE: As a member of the xAI infrastructure team, you will apply your...InternshipWork at officeWeekend work
$102.4k - $153.2k
...Senior Cloud Operations Engineer Job Category: Information Technology Location: US - Massachusetts... ...new environments and upgrade of the infrastructure components and product application... ...recruitment stage. Job ID: 23548 AI in Action - Responsible Use of AI in...Remote workMonday to FridayFlexible hoursShift work- ...Cloud Operations Engineer Neo4j is the graph intelligence platform that transforms data into... ...generation of intelligent applications and AI systems. It includes enterprise-ready... ..., to maximize the application and infrastructure uptime. This role is instrumental in...Work experience placementCasual workWork at officeRemote work
- ...Cloud Operations Engineer Our personalization platform is strategically leveraged by 250 global... ...will be familiar with cloud based infrastructure and platform services across a host of... ...software. We are an enterprise grade AI platform that operates at web scale. You...Work at office
- ...Senior Cloud Operations Engineer for Stellus Rx We're opening eyes, hearts and minds to the impact that a pharmacy... ...role is built for a cloud engineer who uses AI as a core part of how they operate — automating infrastructure management, accelerating troubleshooting, and...Remote work
- ...Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides... ...We are seeking a highly skilled and experienced AI Infrastructure Operations Engineer to manage and operate our cutting-edge machine learning compute...
$93.9k - $159.6k
...Cloud Operations Engineer nCino offers exciting career opportunities for individuals who want... ...craft automated, efficient, and scalable infrastructure that enable us to rapidly produce and... ...code review process Leverage AI tools and techniques to enhance software...Local areaWorldwide- ...Senior Cloud Operations Engineer, Databases Job Category: Engineering & Cloud Location: Poland - Remote Meet Our Team: Pega Poland... ...this role at the relevant recruitment stage. Job ID: 23566 AI in Action – Responsible Use of AI in Recruitment Pega...Permanent employmentWork at officeRemote workFlexible hoursRotating shift
- ...divh2Cloud Network Operations Engineer III/h2pIridium is seeking a Cloud Network Operations Engineer... ...highly available and secure messaging infrastructure critical to production services. This... ...Advanced Networking, AWS Certified AI Practitioner, etc.)/liliExperience with...Contract workFor subcontractorWork at officeRemote work3 days per week
- ...on matters related to day-to-day cloud operations and implementation within Children's Healthcare... ...experience ~ Experience with AI tools ~5 years of experience in a technical... ...position ~ Experience managing cloud infrastructure in an enterprise environment ~...Work experience placementLocal areaMonday to FridayShift work
- ...functionally with Information Security Operations and Infrastructure/DevOps teams, to administer and... ...security configurations for Kubernetes Engine environments, including: Cluster and... ...secure implementations/integrations of AI within cloud infrastructure, including...Remote workFlexible hours
$140k - $185k
...Principal Cloud Engineering and Production Operations Engineer The Principal Cloud and Production Operations... .... This role combines deep cloud infrastructure expertise with strong production... ...access security model Exposure to AI/ML infrastructure or data-driven...For subcontractorLocal area$185k - $200k
...Staff Cloud Operations Engineer Remote, US Branch is on a mission to empower workers with... ...automation (crons, n8n, Airflow) that bridges infrastructure and business processes. Comfortable... ...is a plus. Comfortable leveraging AI tools to accelerate work. The company...Daily paidRemote workHome officeFlexible hours$77.6k - $176k
Network Operations Engineer The Opportunity: Monitor network administration and maintenance operations... ...with secure network systems and infrastructure management Nice If You Have: Experience... ...identity and prevent fraud. Candidate AI Usage Policy AI is a part of our daily...Full timeContract workPart timeWork at officeLocal areaRemote work$1,000 per month
...We're hiring a Azure Cloud Operations Engineer to join our team in Atlanta, GA . This role... ...with the following: ~ Managing cloud infrastructure in an enterprise environment ~ Azure... ...and working knowledge of AI Foundry, including model deployments,...Work experience placementRemote work$107.9k - $195.05k
# Senior Cloud Operations EngineerLeidosFull TimeseniorGaithersburg,... ...experienced Senior Cloud Operations Engineer to support the delivery,... ..., analytics, and emerging AI technologies. Ideal candidates... ...maintain, and optimize cloud infrastructure environments (e.g., AWS, Azure...Local areaImmediate start$165k - $210k
...builds modern analytics and AI solutions that turn complex... ...stay at the forefront of data engineering and AI advancements. Remote... ...will maintain the customer infrastructure once the platform has been built... ...in infrastructure and operations (managing enterprise data platforms...Casual workRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Infrastructure Operations Engineer. Be the first to apply!
- senior ai engineer United States
- ai ml engineer United States
- ai engineer remote United States
- ai engineer United States
- ai prompt engineer United States
- ai developer United States
- ai research engineer United States
- machine learning ai engineer United States
- security infrastructure engineer United States
- principal infrastructure engineer United States


