Senior AI Infrastructure Engineer — Remote, GPU-Scale

€66.5k - €104.5k per year

Phoenix Court Group

Remote job

Sword Health is shifting healthcare from human-first to AI-first through its AI Care platform, making world-class healthcare available anytime, anywhere, while significantly reducing costs for payers, self-insured employers, national health systems, and other healthcare organizations. Sword began by reinventing pain care with AI at its core, and has since expanded into women’s health, movement health, and more recently mental health. Since 2020, more than 700,000 members across three continents have completed 10 million AI sessions, helping Sword's 1,000+ enterprise clients avoid over $1 billion in unnecessary healthcare costs. Backed by 42 clinical studies and over 44 patents, Sword Health has raised more than $500 million from leading investors, including Khosla Ventures, General Catalyst, Transformation Capital, and Founders Fund. Learn more at . As a Senior AI Infrastructure Engineer at Sword Health, you will own the infrastructure that brings our AI models to life in production. From optimizing LLM inference and deploying real-time voice AI agents to scaling GPU clusters that serve millions of sessions, your work will directly power the AI Care platform that is transforming healthcare worldwide. You will sit at the intersection of ML and infrastructure - designing systems that power real-time computer vision for movement analysis, serve large language models for conversational AI, and enable low-latency voice interactions for AI agents. You'll ensure our models run at the speed and scale our members expect. This is not a traditional DevOps role; you'll be deeply embedded in AI-specific challenges like inference optimization, real-time video processing, model serving at scale, and GPU workload orchestration. If you're passionate about pushing the boundaries of AI infrastructure performance and want to do it in a mission-driven environment where your work directly improves people's health outcomes, we'd love to have you on our team. What you'll be doing: Design, build, and maintain the inference infrastructure that powers Sword Health's AI products, ensuring models are served with high throughput, low latency, and cost efficiency. Own the end-to-end deployment pipeline for AI models - from real-time computer vision powering movement analysis to large language models driving conversational AI experiences. Architect and scale Kubernetes clusters for GPU-accelerated workloads, including autoscaling strategies, resource scheduling, and multi-model serving. Build and operate the infrastructure behind Sword Health's real-time AI agents, including WebRTC cluster provisioning and deploying speech-to-text and text-to-speech capabilities at low latency. Drive inference scaling strategies - evaluate and implement techniques such as speculative decoding, continuous batching, and model parallelism to meet growing demand without proportionally increasing costs. Develop and maintain Infrastructure as Code (Terraform) and GitOps workflows tailored to GPU-enabled, AI-specific environments. Instrument and monitor AI inference systems, building observability around GPU utilization, model latency, throughput, and error rates to ensure reliability and performance. Collaborate closely with ML Engineers, Data Scientists, and Product teams to translate model requirements into robust, production-ready infrastructure. Evaluate emerging AI infrastructure tools, frameworks, and hardware to keep Sword Health at the cutting edge of inference performance and efficiency. Mentor team members on AI infrastructure best practices, fostering knowledge sharing around GPU workloads, model serving patterns, and production ML systems. What you need to have: 5+ years of experience in infrastructure engineering, with at least 2 years focused on AI/ML workloads in production environments. Strong experience with Kubernetes for orchestrating GPU-accelerated workloads, including scheduling, resource management, and autoscaling for inference services. Hands-on experience with model serving and inference optimization frameworks for both real-time computer vision and large language model workloads. Solid understanding of LLM inference optimization techniques, including speculative decoding, batching strategies, quantization, and inference scaling patterns. Experience provisioning and managing infrastructure for real-time AI systems, including WebRTC clusters and AI agent architectures. Familiarity with real-time video/computer vision inference pipelines and the infrastructure challenges of processing continuous visual data streams at low latency. Familiarity with speech-to-text and text-to-speech serving infrastructure and the challenges of running voice AI at low latency. Experience with Infrastructure as Code (Terraform or similar) and GitOps methodologies for managing complex, GPU-enabled environments. Working knowledge of GPU infrastructure - NVIDIA CUDA ecosystem, multi-GPU setups, and GPU monitoring/profiling. Strong Linux systems fundamentals and networking knowledge, particularly for latency-sensitive, real-time workloads. Fluent in English (written and oral). A proactive, ownership-driven mindset - you see a bottleneck in an inference pipeline and you fix it before it becomes a problem. What we would love to see: AI Inference & Model Serving: Experience with LLM serving engines such as vLLM, SGLang, or LLM-D. Experience with NVIDIA Triton Inference Server and TensorRT for real-time computer vision workloads. Familiarity with NVIDIA Riva or similar platforms for STT/TTS serving. Understanding of speculative decoding, continuous batching, quantization, and model parallelism techniques. Kubernetes & Infrastructure: Experience with Istio or similar service mesh. Experience with Kafka for event streaming. Experience with Prometheus, AlertManager, and Grafana for monitoring and observability. Experience with Elasticsearch, Logstash, and Kibana (ELK) for log management. Experience with Vault for secrets management. Experience with Redis, MySQL, and DNS management. Experience provisioning infrastructure on AWS, Azure, or GCP. Good knowledge of cloud networking including VPC management, routing, NAT, and troubleshooting with tools like TCPdump. General: Experience with WebRTC infrastructure and real-time media streaming. Experience with Python, Go, or similar languages commonly used in ML infrastructure tooling. Familiarity with SCRUM methodology. To ensure you feel good solving a big Human problem, we offer: A stimulating, fast-paced environment with lots of room for creativity; A bright future at a promising high-tech startup company; Career development and growth, with a competitive salary; The opportunity to work with a talented team and to add real value to an innovative solution with the potential to change the future of healthcare; A flexible environment where you can control your hours (remotely) with unlimited vacation; Access to our health and well-being program (digital therapist sessions); Remote or Hybrid work policy; To get to know more about our Tech Stack, check here . €66,500 - €104,500 a year *This range includes base, variable and equity These compensation bands are just the starting point. Once someone joins and proves they’re outlier talent, we adjust quickly to ensure their compensation aligns with their impact. Our job titles may span more than one career level. Actual pay is determined by skills, qualifications, experience, location, market demand, and other factors. Compensation details listed in this posting reflect the base salary and any potential variable, bonus or sales incentives, and the Company’s estimation of the value of private company stock options, if applicable. The pay range is subject to change, future value of company stock options is not guaranteed, and compensation may be modified in the future. In addition to our total compensation, Sword offers a number of benefits as listed below. Sword Health complies with applicable Federal and State civil rights laws and does not discriminate on the basis of Age, Ancestry, Color, Citizenship, Gender, Gender expression, Gender identity, Gender information, Marital status, Medical condition, National origin, Physical or mental disability, Pregnancy, Race, Religion, Caste, Sexual orientation, and Veteran status. #J-18808-Ljbffr Phoenix Court Group

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Senior AI Infrastructure Engineer — Remote, GPU-Scale in New Bremen, OH vacancy

Senior AI Infra Networking Engineer | High-Perf GPU Cloud
$100k - $200k
...Nscale is seeking a Senior Network Engineer for AI Infrastructure to ensure the operational health of their high-speed... ...maintaining and optimizing large-scale Infiniband and RoCE fabrics. The ideal... ...with Python. Join a thriving remote-first environment at a cutting-edge...
Remote work
Senior
Nscale
New York, NY
3 days ago
Senior AI Platform Engineer - Remote, Scale Enterprise AI
...A leading AI procurement firm is seeking a Senior AI Platform Engineer to build and scale their core platform. The role requires strong backend development skills, proficiency in Node.js and React, and experience with AI systems. You'll work alongside founding engineers...
Remote work
Senior
Negotiateai Inc.
New York, NY
3 days ago
Senior AI Engineer Scale AI Platform (Remote)
...Jitterbit is seeking a Senior AI Engineer to join our innovative team in the United States. The role focuses on building AI capabilities on... ...fun and performance-oriented culture that allows for effective remote work. Join us to contribute to our mission of transforming business...
Remote work
Senior
Jitterbit
New York, NY
3 days ago
Senior AI Backend Engineer - Platform & Scale (Remote)
$170k - $200k
...Alumni Ventures is seeking a Senior Backend Engineer focused on AI. This remote position offers a salary of US $170k–200k with equity options. You will design and build shared AI infrastructure, ensuring reliability and performance of AI systems. The ideal candidate has...
Remote work
Senior
Alumni Ventures
New York, NY
1 day ago
Remote Senior ML/AI Platform Engineer - Scale & Deploy AI
$200k - $245k
BetterHelp is seeking a Staff ML/AI Platform Engineer in San Jose, CA. This role focuses on platform engineering, machine learning, and software... ..., and mentoring team members. The position supports remote work with occasional travel. Competitive compensation ranges...
Remote job
Senior
Nerdleveltech
San Jose, CA
4 days ago
Senior AI Platform Engineer, Core Cloud Engineering
$110k - $140k
...performance cloud infrastructure easy to use, affordable... ...enterprises and AI innovators around... ...Compute, Cloud GPU, Bare Metal, and Cloud... ...$500 stipend for remote office setup in... ...experienced AI Platform Engineer to own the... ...— at non‑trivial scale. Strong Docker and...
Remote work
Senior
Work at office
Immediate start
Flexible hours
Vultr
New York, NY
3 days ago
Senior Cloud GPU AI Infrastructure Engineer
...Consulting Member of Technical Staff for its AI Infrastructure team in Austin, Texas. This role focuses on building high-performance GPU platforms and overseeing the software... .../MS in Computer Science, 6+ years in large-scale systems, and proficiency in programming languages...
Senior
Flexible hours
Oracle
Austin, TX
3 days ago
Senior GPU Infrastructure Engineer
...a mission to democratize AI by breaking down the barriers... ..., we offer an innovative GPU marketplace and AI... ...the Role We're seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud Marketplace... ...IPMI/Redfish, BMC-based remote management, PXE boot, and...
Remote work
Senior
Hyperbolic Labs
San Francisco, CA
3 days ago
Senior Software Engineer, Fabric Networking - GPU
$152k - $241.5k
...Computing and Visualization. The GPU, our invention, serves as... ...for highly motivated Senior Software Engineers to work on our GPU Fabric Networking... ...software to support large scale computing platforms. Work... .... NVIDIA uses AI tools in its recruiting processes...
Remote work
Senior
NVIDIA
United States
4 days ago
Remote AI Infrastructure Engineer: GPU & ML Platform Lead
Bright Vision Technologies is looking for an AI Infrastructure Engineer to design and operate infrastructure that supports large-scale AI workloads. The role is entirely remote and requires expertise in GPU clusters, distributed training, and performance optimization. Ideal...
Remote job
Full time
Bright Vision Technologies
Edison, NJ
3 days ago
Remote GPU Cloud Platform Engineer: Scale AI Compute
...A pioneering AI infrastructure company is seeking a GPU Cloud Platform Engineer to design and operate large-scale GPU clusters. This remote position aims to ensure high availability and performance of containerized AI workloads across cloud environments. The ideal candidate...
Remote work
Yotta Labs
New York, NY
3 days ago
Senior AI Platform Engineer Remote (LATAM)
...A leading tech recruitment firm is seeking a Senior AI Platform Engineer to build the core AI infrastructure for a fast-scaling SaaS company. You will lead the development of... ...an emphasis on async programming. This fully remote role emphasizes innovation and high impact,...
Remote work
Senior
Talentcross
Topeka, KS
4 days ago
Senior Technical Recruiter - AI Infrastructure and Engineering
$60 - $70 per hour
...Senior Technical Recruiter - AI Infrastructure and Engineering Remote, US $60 - $70 Job Title Senior Technical Recruiter - AI Infrastructure and Engineering Location... ...the Mission We support companies building large-scale AI systems—performance-optimized, sustainability...
Remote work
Senior
The Leadership Agency Inc.
New York, NY
3 days ago
Senior AI Compute Platform Engineer (Remote)
$286.2k - $326.7k
Capital One is seeking a Senior Distinguished Engineer, AI Compute based in Richmond, VA. This remote-eligible role involves architecting and scaling foundational capabilities for an enterprise AI and ML platform. Candidates should have a bachelor's degree, 7+ years of...
Remote work
Senior
Capital One
Richmond, VA
17 hours ago
Infrastructure Engineer - AI-Scale GPU & Cloud (Remote)
$170k - $220k
...Boston, MA is seeking an experienced infrastructure manager to support large-scale AI workloads. This role involves designing and optimizing GPU and cloud infrastructure for efficient... ...Exceptional candidates may be considered for remote work. #J-18808-Ljbffr Subconscious...
Remote job
Subconscious Systems Technologies, Inc.
Boston, MA
2 days ago
Senior ML Infrastructure Engineer - GPU & Scale
...technology company is seeking a Senior Machine Learning Engineer to build and operate systems that power large-scale machine learning training.... ...role includes designing ML infrastructure, optimizing performance,... ...in SLURM, Kubernetes, and GPU workloads. The company offers...
Senior
Flexible hours
TensorWave
Las Vegas, NV
2 days ago
Senior AI Infrastructure Engineer
...the Scientific Data and AI company. We are... ...compute, cloud, data, and AI infrastructure have converged on TetraScience... ...Do We’re looking for a Senior AI Platform Engineer to help design, build, and scale our AI and data... ...working arrangements – Remote work Company paid Life...
Remote work
Senior
Immediate start
Flexible hours
TetraScience
New York, NY
3 days ago
Senior AI Platform Engineer - Supernal
$50 per hour
...Location: Remote (Global) Reports to: Head of Product... ...SMBs hire their first AI employee. Our AI teammates... .... Our AI Platform Engineers, known internally as Masons... ..., we’re looking for a Senior Mason to help lead this... ...systems are delivered at scale. This is a hands‑on...
Remote work
Senior
For contractors
Infinity
New York, NY
17 hours ago
Senior AI Platform Engineer Remote, LLM & Agents
...technology company in Seattle is seeking a seasoned software engineer with over 10 years of experience, specifically in... ...This role involves shaping how the organization builds and scales agentic AI while ensuring reliability, governance, and security across...
Remote work
Senior
F5 Networks
Seattle, WA
17 hours ago
Senior AI Engineer Data Infrastructure Multimodal Models 100% Remote
...About the job We’re seeking experienced AI infrastructure Engineers to design and implement robust,... ...development. Responsibilities Build and scale high‑throughput data infrastructure optimized... ...content processing across large GPU clusters (e.g., H100/H200). Design core...
Remote work
Senior
Framework Ventures
United States
2 days ago
Senior AI Platform Engineer (Python + AWS)
...About the job Senior AI Platform Engineer (Python + AWS) Remote | LATAM | Full-Time One of TalentCross's most innovative and... ...help build the backbone of their AI infrastructure. This is a rare opportunity to work with a fast-scaling SaaS company that's transforming a legacy...
Remote work
Senior
Full time
Talentcross
Topeka, KS
4 days ago
Senior AI Infrastructure Engineer: GPU Clusters & LLM Ops
$136.5k - $253.5k
Cadence is seeking a highly skilled AI Systems Engineer to join their team in San Jose, CA. This hands-on, senior role will lead the AI infrastructure development, including architecting high-performance GPU clusters and deploying advanced AI models. Ideal candidates will...
Senior
Cadence
San Jose, CA
3 days ago
Senior Rust & Cloud-Scale Engineer: Edge to Cloud
...firm in the United States is seeking a Senior or Staff Software Engineer to join their distributed systems team. The role involves designing and scaling distributed systems and leading the... ...offers competitive salaries, equity, and remote work flexibility, along with a strong...
Remote work
Senior
Ditto
New York, NY
3 days ago
Senior AI Platform Engineer at Negotiateai Inc. Remote
...Senior AI Platform Engineer job at Negotiateai Inc.. Remote. Location Canada Remote, United States Remote Employment Type Full time Location Type Remote Department... ...with the founding engineers. You’ll help build and scale the core platform across backend services, AI...
Remote work
Senior
Full time
Negotiateai Inc.
New York, NY
3 days ago
Senior AI Platform Engineer
$175k - $215k
...Senior AI Platform Engineer Boulder, CO or remote Ideal start timeline: July 2026 Role status: Exempt Compensation: Our target hiring range is $... ...roll out AI-first development and CI/CD flows, that scale to the needs of high-velocity agentic coding Be a...
Remote work
Senior
Summer work
Work at office
Flexible hours
CampMinder
Boulder, CO
2 days ago
Senior AI Networking Engineer: ML-Driven DSE for LLM scale
A leading technology company in Seattle seeks a Senior Software Engineer to join their AI Networking team. This role involves building ML tools for optimizing... ...AI workloads across data centers, focusing on large-scale deep learning. Candidates should have a PhD or...
Senior
NVIDIA Corporation
Seattle, WA
4 days ago
AI Platform Engineer, Senior
$86.8k - $198k
...Job Description Remote Work: No Job Number: R0234072... ...Share job via: Share AI Platform Engineer, Senior The Opportunity: As an AI Platform... ...designing, building, and operating large-scale production systems Experience in...
Remote work
Senior
Full time
Contract work
Part time
Work at office
Local area
Booz Allen Hamilton
United States
1 day ago
Remote AI Infrastructure Engineer - GPU, Kubernetes & Cloud
...thinking software development company is seeking an AI Infrastructure Engineer to join their dynamic team remotely. This full-time position focuses on AI/ML... ...strong skills in cloud platforms, Kubernetes, and GPU computing. The ideal candidate has 3-5 years of relevant...
Remote work
Full time
Bright Vision Technologies
Orlando, FL
17 hours ago
Senior Platform AI Engineer
$192k - $259.8k
...Drata AI Platform Team Role Drata's AI Platform team builds the production infrastructure that powers AI features across our compliance platform... ...agent developers, product engineers, and an embedded SRE... ...manage spend as AI workloads scale. Platform Enablement...
Remote work
Senior
Flexible hours
Drata Inc
United States
1 day ago
Senior AI Platform Engineer
$93.2k - $174.2k
...Senior AI Platform Engineer The Senior AI Platform Engineer is responsible for the technical design,... ...Both are classified as predominantly remote with occasional visits to campus as required... ...0 annually. However, the expected pay scale for this position is up to $135,000...
Remote work
Senior
For contractors
Worldwide
Work visa
University of California
United States
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Infrastructure Engineer — Remote, GPU-Scale. Be the first to apply!