Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Platform Support Engineer (US)

$115k - $140k

Lightning AI

Platform Support Engineer (US)

San Francisco, California, United States; Seattle, Washington, United States

Who We Are

Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.

We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

What We're Looking For

Lightning AI is looking to hire a Platform Support Engineer to join our US Customer Experience team, supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments.

This role sits at the intersection of ML systems, cloud infrastructure, Kubernetes, and customers. You'll support engineers training models, deploying inference systems, and scaling GPU workloads in production. You are not a ticket router or traditional support engineer. You are a technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems.

The problems range from Kubernetes scheduling and GPU orchestration to distributed PyTorch failures, inference latency, networking bottlenecks, storage performance, and platform reliability. You'll gain exposure to a wide variety of real world AI workloads across industries and help shape the infrastructure powering the next generation of ML applications.

What You'll Do

Work Directly With ML Engineers

  • Partner directly with customer engineering teams running training and inference workloads in production
  • Help customers diagnose and resolve complex distributed systems and ML infrastructure issues
  • Act as a technical advisor during high impact incidents and platform degradation events
  • Translate infrastructure level issues into actionable guidance for ML engineers
  • Build credibility with customers through strong technical reasoning and clear communication

Debug ML Infrastructure & Distributed Workloads

  • Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems
  • Troubleshoot PyTorch, CUDA, NCCL, and inference serving related issues
  • Analyze logs, metrics, traces, and system behavior to isolate root causes
  • Debug containerized workloads running across Kubernetes and bare metal GPU environments
  • Support customers scaling workloads across multi node GPU systems
  • Diagnose performance bottlenecks involving compute, memory, networking, or storage

Improve Reliability & Platform Operations

  • Identify recurring patterns across customer issues and drive long term reliability improvements
  • Contribute to post incident reviews and operational improvements
  • Build internal tooling, automation, documentation, and runbooks
  • Partner closely with infrastructure, networking, and platform engineering teams
  • Help improve observability, operational visibility, and troubleshooting workflows
  • Improve the customer experience through better processes and technical guidance

What This Role Is Not

To set clear expectations:

  • This is not a traditional help desk or ticket routing support role
  • This is not purely customer success or account management
  • This is not a backend engineering role
  • This is not a passive escalation position

This role is for engineers who enjoy solving difficult technical problems while working closely with other engineers.

What You'll Need
Required Qualifications
Infrastructure & Systems
  • Strong software engineering and systems troubleshooting background
  • Experience with Kubernetes and containerized environments
  • Linux systems knowledge, including networking, storage, process management, and performance tuning
  • Experience with cloud infrastructure and distributed systems
  • Experience with observability and debugging tools such as Prometheus, Grafana, or OpenTelemetry
ML Infrastructure Experience
  • Hands on experience operating machine learning workloads in production or research environments
  • Experience with distributed ML systems and tooling such as PyTorch, CUDA, or NCCL
  • Familiarity with GPU infrastructure and orchestration
  • Experience troubleshooting performance, reliability, or scaling issues in ML infrastructure
  • Understanding of the operational challenges involved in running ML systems at scale
Collaboration
  • Strong communication skills and ability to work directly with highly technical customers and engineering teams
  • Comfortable operating in fast moving, highly ambiguous environments
  • Enjoys solving complex technical problems collaboratively
Ideal Experience
  • Experience with large scale model training or distributed inference systems
  • Familiarity with Ray, Kubeflow, Slurm, or similar distributed scheduling platforms
  • Experience with InfiniBand, RDMA, or high-performance networking
  • Experience operating bare metal infrastructure
  • Familiarity with storage systems commonly used in ML environments
  • Experience working at an AI infrastructure, cloud, MLOps, or developer tooling company
  • Contributions to platform engineering, developer infrastructure, or operational tooling projects
  • Experience writing automation, tooling, or scripts in Python or similar languages

This role is hybrid out of our Seattle or San Francisco offices, with an in-office requirement of at least 2 days per week and occasional team and company offsites. The role follows a Monday–Friday schedule, with working hours from 8:00 AM to 5:00 PM PST. We are not able to provide visa sponsorship for this role at this time.

We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits.

The anticipated annual base salary range for this role is:

$115,000 - $140,000 USD

Benefits and Perks

We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role.

Benefits include:

  • Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
  • Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
  • Generous paid time off, plus holidays
  • Paid parental leave
  • Professional development support
  • Wellness and work-from-home stipends
  • Flexible work environment

At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.

Vacancy posted 5 hours ago
Similar jobs that could be interesting for youBased on the AI Platform Support Engineer (US) in San Francisco, CA vacancy
  • $115k - $140k

     ...Who We Are Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems...  ...AI is looking to hire a Platform Support Engineer to join our US Customer Experience team, supporting... 
    Suggested
    Work at office
    Work from home
    Monday to Friday
    Flexible hours
    2 days per week

    Lightning AI

    San Francisco, CA
    1 day ago
  • $195.3k - $199.2k

    Technology & Digital Platform Full-Stack Engineer - US Defense / Public Sector Job ID: 106486 Boston...  ...to do work that matters, alongside supportive leaders who will help you grow faster...  ...minds on a product team developing an AI Platform that will enable next-generation... 
    Suggested
    Hourly pay
    Apprenticeship
    Work at office
    Easy work

    McKinsey & Company

    San Francisco, CA
    2 days ago
  • $100k - $170k

     ...actually run it. Dust is the multiplayer AI platform for human-agent collaboration. It gives...  ...while doing so. The Generalist named us among the Future 50. About the Role...  ...even do this the same way?" The AI Support Engineer applies this mindset to Support. You'll... 
    Suggested
    Work at office
    Immediate start
    Work from home
    Flexible hours

    Dust

    San Francisco, CA
    1 day ago
  •  ...ecosystem powering the next generation of AI products. We build the infrastructure,...  ...just possible, but practical: a unified platform where high-performance inference, orchestration...  ...is seeking a highly skilled Technical Support Engineer to provide high-quality support and... 
    Suggested

    Fal

    San Francisco, CA
    2 days ago
  •  ...technology? Join our team at OSARO as a Technical Support Engineer and help us develop cutting-edge AI-based autonomous industrial robotic solutions. If you...  ..., and application logs, as well as the tools and platforms used for their management and analysis. ~ Basic understanding... 
    Suggested
    Full time
    Immediate start
    Remote work
    Flexible hours

    OSARO

    San Francisco, CA
    27 days ago
  •  ...About Us Socket helps devs and security teams ship faster by cutting out security busywork. Thousands of orgs use Socket...  ...leaders. About the Role We're looking for a Technical Support Engineer to join as Socket's first dedicated support hire in the US. You... 
    Remote work
    Flexible hours

    Socket

    San Francisco, CA
    1 day ago
  • $170k - $230k

     ...IT Support Engineer Seattle, WA About Anthropic Anthropic's mission...  ...interpretable, and steerable AI systems. We want AI to be...  ...operational improvements that help us scale. You'll work closely...  ..., Android, and our core SaaS platforms (Google Workspace, Slack, GitHub... 
    Work at office
    Immediate start
    Visa sponsorship

    Anthropic

    San Francisco, CA
    1 day ago
  • About the Role As a Customer Support Engineer at a pioneering AI company, you'll be the first line of defense to support customers as they build out...  ...benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,00... 
    Full time
    Remote work
    Flexible hours
    Night shift
    Weekend work

    Together AI

    San Francisco, CA
    4 days ago
  • $96k - $140k

     ...successful. Premier Technical Support Engineers (PSEs) are primarily focused...  ...our clients on the use of the platform Develop relationships with...  ...security platform for the AI era, providing businesses with...  ...If you would like to contact us regarding the accessibility of... 
    Work at office
    Immediate start
    Worldwide

    Datadog

    San Francisco, CA
    23 hours ago
  •  ...Technical Support Engineer We started by building infrastructure to run...  ...infrastructure into a broader platform: running agent sandboxes at scale...  ...Experience in setting up AI/agent native workflows. Comfortable...  ...stock options. 12 weeks fully paid parental leave (US).... 

    Blacksmith

    San Francisco, CA
    23 hours ago
  •  ...Technical Support Engineer San Francisco, CA About Starburst Starburst delivers enterprise...  ...environments, Starburst helps enterprises power AI and analytics without the cost and...  ...perspective, background and experience will enable us to own what we do, drive our success and... 
    Local area
    Flexible hours

    Starburst

    San Francisco, CA
    4 days ago
  • $90k - $125k

     ...Technical Support Engineer San Francisco, CA Sigma is growing rapidly...  ...data challenges using the Sigma platform. You'll work closely with...  ...looking for people that will help us grow our company and...  ...About us: Sigma is the AI Apps and agentic analytics platform... 
    Full time
    Work experience placement
    Work at office
    Flexible hours

    Sigma Computing

    San Francisco, CA
    1 day ago
  • $125k - $150k

     ...and our team is building its AI-native future. About the Role At Sentry, Support is an engineering discipline. Our customers are...  ...for a veteran engineer to help us redefine the standard of technical...  ...to prioritize high-impact platform fixes. Engineer Agentic Operations... 
    Hourly pay
    Full time
    Night shift

    Sentry

    San Francisco, CA
    4 days ago
  •  ...the next hyperscaler for AI agents. ABOUT THE...  ...This is our second support hire. Customers using E...  ...between customers and the engineering team. You'll spend...  ...The customers writing to us are technical, so the...  ...telemetry, and correlate with platform metrics to figure out... 
    Work from home

    E2B

    San Francisco, CA
    1 day ago
  • $36 - $43 per hour

     ...innovation lies our advanced AI-centric technology, representing...  ...for a Technical Customer Support Engineer Tier II to join our Support...  ...mindset ~ Experience with cloud platforms like AWS, Azure, or GCP ~...  ...as we scale across the US. Compensation: UVeye... 
    Hourly pay
    Afternoon shift

    UVeye

    San Francisco, CA
    4 days ago
  •  ...About us At Sierra, we're creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an in-person company based in San Francisco...  ...teams for Google Workspace. Support Engineering at Sierra Companies use... 
    Full time
    Flexible hours

    Sierra

    San Francisco, CA
    3 days ago
  • $50k - $80k

     ...to apply for the Technical Support Engineer role at Wispr Flow Base...  ...is the first voice dictation platform people use more than their keyboards...  ...you ask. We are a team of AI researchers, designers,...  ...processes better. A bit about us We are a collection of... 
    Night shift

    Wispr Flow

    San Francisco, CA
    23 hours ago
  • $130k - $195k

     ...About Us At LangChain, our mission is to make...  ...foundation for agent engineering in the real world, helping...  ...to production-ready AI agents that teams can rely...  ...grown to also offer a platform for building,...  ...hiring a Senior Technical Support Engineer to lead our customer... 
    Work at office
    Remote work
    Flexible hours

    LangChain, Inc

    San Francisco, CA
    1 day ago
  • $120k - $160k

     ...About Fable Security AI-driven threats and human error...  .... Fable is the human risk platform that directly shapes employee...  ...The Role As Fable's first Support Engineer, you will own the end-to-end...  ...CCSP, CCSK, CISSP) Why Join Us? Competitive base +... 
    Work experience placement
    Flexible hours

    Fable

    San Francisco, CA
    1 day ago
  •  ...Support Engineer Zuma is pioneering the future of agentic AI and our focus is to transform the rental market experience for...  ...property manager alike. Our innovative platform is engineered from the ground up...  ...management business across the US and Canada, a ~$200B market.... 
    Immediate start
    Shift work

    ZUMA

    San Francisco, CA
    21 days ago
  • $210k - $250k

     ...interpretable, and steerable AI systems. We want AI to...  ...committed researchers, engineers, policy experts, and...  ...We are hiring Support Engineers to serve as the...  ...engineer and an internal platform team without losing either...  ...Your safety matters to us. To protect yourself from... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    4 days ago
  • $234k - $260k

     ...Team The Technical Support team is responsible for...  ...Success, Product, Engineering and others to deliver the...  ...leverage the latest in AI to scale our support operations...  ...with the OpenAI API platform. The nature of this...  ...Fair Chance Act, for US-based candidates. For unincorporated... 
    Work at office
    Relocation package
    Night shift
    Weekend work

    OpenAI

    San Francisco, CA
    1 day ago
  • $120k - $190k

    About Glean: Glean is the Work AI platform that helps everyone work smarter with AI. What...  ...looking for a talented Designated Technical Support Engineer to join our rapidly expanding, venture-...  ...processing of your data as required. US applicants and their applications are... 
    Work experience placement
    Home office
    Flexible hours
    Shift work

    Glean.info

    San Francisco, CA
    4 days ago
  •  ...Carlo is the agent trust platform that unifies data and...  ...and improve production AI systems. As enterprises...  ...reliability infrastructure to support them along this AI...  ...Technical Support Engineers to own the end-to-end customer...  .... Location: US West Coast (Pacific time... 
    Remote work

    Montecarlo

    San Francisco, CA
    4 days ago
  • $100 per hour

     ...premier conference for video engineers in the world. We’re backed by...  ...and small, love working with us and love our team. We are building...  ...! About the Role This is a Support Engineering role as part of...  ...service team to improve Mux’s AI-support workflows. Build out... 
    Temporary work
    Work at office
    Flexible hours

    Mux

    San Francisco, CA
    2 days ago
  •  ...is redefining customer support for the next generation...  ...fastest, most powerful platform to help companies move...  ...supercharge their workflows with AI. We're a small,...  ...As one of our Support Engineers, you'll be a clear...  ...handover between EU and US timezones so nothing drops... 
    Work at office
    Immediate start
    Day shift
    3 days per week
    Early shift

    Plain

    San Francisco, CA
    23 hours ago
  •  ...A leading AI company in San Francisco is seeking a User Operations Specialist to enhance...  ...over 8 years of experience in technical support, possesses strong critical thinking skills...  ...hybrid work model offering flexibility. Join us in shaping the future of technology. #J-18... 
    Relocation

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...re changing that, using AI to disrupt a massive...  ...technical backbone of our support team, resolving complex...  ..., collaborating with engineering to fix bugs, and building...  ...a safe and trusted platform. Our team has a strong...  ...ideas shine. Think of us as your friendly neighborhood... 
    Full time
    Work at office
    Local area
    Remote work
    Work from home

    Gamma

    San Francisco, CA
    3 days ago
  • About Dust We're creating a new AI operating system that has the potential to change the way companies operate . Our mission at...  ...used weekly by 70%+ of our customers' teams. We're looking for a Support Engineer to define what AI-native support looks like in practice. The... 

    Dust

    San Francisco, CA
    2 days ago
  • $80k - $90k

     ...Remote, located in the US; San Francisco Bay Area...  ...time employee Department: Engineering Reports to: Head Of...  ...basics, queues, cloud platforms (e.g., GCP/AWS), CI/CD....  ...developing with generative AI or voice agents....  ...Prior customer-facing or support/TA experience. How we’... 
    Permanent employment
    Full time
    Remote work

    Ellipsis Health

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Platform Support Engineer (US). Be the first to apply!