Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Platform Support Engineer (US)

$115k - $140k

Lightning AI

Who We Are

Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems-designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.

We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

What We're Looking For

Lightning AI is looking to hire a Platform Support Engineer to join our US Customer Experience team, supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments.

This role sits at the intersection of ML systems, cloud infrastructure, Kubernetes, and customers. You'll support engineers training models, deploying inference systems, and scaling GPU workloads in production.You are not a ticket router or traditional support engineer. You are a technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems.

The problems range from Kubernetes scheduling and GPU orchestration to distributed PyTorch failures, inference latency, networking bottlenecks, storage performance, and platform reliability. You'll gain exposure to a wide variety of real world AI workloads across industries and help shape the infrastructure powering the next generation of ML applications.
What You'll Do

Work Directly With ML Engineers
  • Partner directly with customer engineering teams running training and inference workloads in production
  • Help customers diagnose and resolve complex distributed systems and ML infrastructure issues
  • Act as a technical advisor during high impact incidents and platform degradation events
  • Translate infrastructure level issues into actionable guidance for ML engineers
  • Build credibility with customers through strong technical reasoning and clear communication
Debug ML Infrastructure & Distributed Workloads
  • Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems
  • Troubleshoot PyTorch, CUDA, NCCL, and inference serving related issues
  • Analyze logs, metrics, traces, and system behavior to isolate root causes
  • Debug containerized workloads running across Kubernetes and bare metal GPU environments
  • Support customers scaling workloads across multi node GPU systems
  • Diagnose performance bottlenecks involving compute, memory, networking, or storage
Improve Reliability & Platform Operations
  • Identify recurring patterns across customer issues and drive long term reliability improvements
  • Contribute to post incident reviews and operational improvements
  • Build internal tooling, automation, documentation, and runbooks
  • Partner closely with infrastructure, networking, and platform engineering teams
  • Help improve observability, operational visibility, and troubleshooting workflows
  • Improve the customer experience through better processes and technical guidance
What This Role Is Not

To set clear expectations:
  • This is not a traditional help desk or ticket routing support role
  • This is not purely customer success or account management
  • This is not a backend engineering role
  • This is not a passive escalation position
This role is for engineers who enjoy solving difficult technical problems while working closely with other engineers.

What You'll Need
Required Qualifications
Infrastructure & Systems
  • Strong software engineering and systems troubleshooting background
  • Experience with Kubernetes and containerized environments
  • Linux systems knowledge, including networking, storage, process management, and performance tuning
  • Experience with cloud infrastructure and distributed systems
  • Experience with observability and debugging tools such as Prometheus, Grafana, or OpenTelemetry
ML Infrastructure Experience
  • Hands on experience operating machine learning workloads in production or research environments
  • Experience with distributed ML systems and tooling such as PyTorch, CUDA, or NCCL
  • Familiarity with GPU infrastructure and orchestration
  • Experience troubleshooting performance, reliability, or scaling issues in ML infrastructure
  • Understanding of the operational challenges involved in running ML systems at scale
Collaboration
  • Strong communication skills and ability to work directly with highly technical customers and engineering teams
  • Comfortable operating in fast moving, highly ambiguous environments
  • Enjoys solving complex technical problems collaboratively
Ideal Experience
  • Experience with large scale model training or distributed inference systems
  • Familiarity with Ray, Kubeflow, Slurm, or similar distributed scheduling platforms
  • Experience with InfiniBand, RDMA, or high-performance networking
  • Experience operating bare metal infrastructure
  • Familiarity with storage systems commonly used in ML environments
  • Experience working at an AI infrastructure, cloud, MLOps, or developer tooling company
  • Contributions to platform engineering, developer infrastructure, or operational tooling projects
  • Experience writing automation, tooling, or scripts in Python or similar languages
This role is hybrid out of our Seattle or San Francisco offices, with an in-office requirement of at least 2 days per week and occasional team and company offsites. The role follows a Monday-Friday schedule, with working hours from 8:00 AM to 5:00 PM PST. We are not able to provide visa sponsorship for this role at this time.


We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits.

The anticipated annual base salary range for this role is:

$115,000-$140,000 USD

Benefits and Perks

We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role.

Benefits include:
  • Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
  • Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
  • Generous paid time off, plus holidays
  • Paid parental leave
  • Professional development support
  • Wellness and work-from-home stipends
  • Flexible work environment

At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Platform Support Engineer (US) in San Francisco, CA vacancy
  •  ...About Us Socket helps devs and security teams ship faster by cutting out security busywork. Thousands of orgs use Socket...  ...leaders. About the Role We're looking for a Technical Support Engineer to join as Socket's first dedicated support hire in the US. You... 
    Suggested
    Remote work
    Flexible hours

    Socket

    San Francisco, CA
    2 days ago
  • $195.3k - $199.2k

    Technology & Digital Platform Full-Stack Engineer - US Defense / Public Sector Job ID: 106486 Boston Chicago New York City San Francisco...  ...Later Do you want to do work that matters, alongside supportive leaders who will help you grow faster than you ever... 
    Suggested
    Hourly pay
    Apprenticeship
    Work at office
    Easy work

    McKinsey & Company

    San Francisco, CA
    3 days ago
  • $100k - $170k

     ...run it. Dust is the multiplayer AI platform for human-agent collaboration. It gives...  ...seriously while doing so. The Generalist named us among the Future 50. About the Role...  ...even do this the same way?" The AI Support Engineer applies this mindset to Support. You'll... 
    Suggested
    Work at office
    Immediate start
    Work from home
    Flexible hours

    Dust

    San Francisco, CA
    2 days ago
  • $94k - $118k

     ...employment Visa sponsorship. Overall Purpose TheSr. Data Platform Support Engineerserves as the technical owner and administrator for...  ...providing expert-level application and system support. The engineer will design, implement, and maintain integrations across enterprise... 
    Suggested
    Hourly pay
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Flexible hours

    Early Warning Services

    San Francisco, CA
    1 day ago
  • $85k - $95k

     ...ultimately, safer nations. Connect with a career that matters, and help us build a safer future. Department OverviewThe Software...  ...respond faster with smarter and safer decisions. We deploy and support products such as Emergency Call Handling, 911 Equipment, Computer... 
    Suggested
    Remote work
    Relocation

    Motorola Solutions

    San Francisco, CA
    5 days ago
  •  ...robotics technology? Join our team at OSARO as a Technical Support Engineer and help us develop cutting-edge AI-based autonomous industrial...  ..., system, and application logs, as well as the tools and platforms used for their management and analysis. ~ Basic understanding... 
    Full time
    Immediate start
    Remote work
    Flexible hours

    OSARO

    San Francisco, CA
    28 days ago
  • $120k - $160k

     ...Job Description Job Description Customer Support Engineer Metriport is an open-source data intelligence platform that helps healthcare organizations access and exchange...  ...data in real-time. We integrate with all major US healthcare IT systems and tap into comprehensive... 
    Work at office
    Work from home
    Flexible hours

    Metriport

    San Francisco, CA
    5 days ago
  •  ...Rootshell Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking IT Support Engineer for one of our client, Please share your resume with current location & full contact info Role:IT Support... 
    Work at office
    Flexible hours
    Afternoon shift
    Early shift

    Rootshell Enterprise Technologies

    San Francisco, CA
    4 days ago
  • $137k - $205.6k

     ...Technical Support Engineer New York City; San Francisco Bay Area About Us Metronome is the leading usage-based billing platform built for modern software companies. With Metronome, companies can launch products faster, offer any pricing model, and streamline... 
    Work experience placement

    Metronome LLC

    San Francisco, CA
    2 days ago
  • $170k - $230k

     ...IT Support Engineer Seattle, WA About Anthropic Anthropic's mission is to create reliable...  ...the operational improvements that help us scale. You'll work closely with IT Engineering...  ..., iOS, Android, and our core SaaS platforms (Google Workspace, Slack, GitHub, Atlassian... 
    Work at office
    Immediate start
    Visa sponsorship

    Anthropic

    San Francisco, CA
    2 days ago
  •  ...Platform/DevOps Engineer Unto Labs is a team of engineers pushing distributed systems to their physical...  ...Our infrastructure is fast-evolving, supporting distributed node architectures across...  ...(C, Rust, TypeScript) Why Join Us? Foundational platform role to... 
    Work at office
    Local area
    Flexible hours

    Unto Labs

    San Francisco, CA
    2 days ago
  • $36 - $43 per hour

     ...We're looking for a Technical Customer Support Engineer Tier II to join our Support team. You'll...  ...first mindset ~ Experience with cloud platforms like AWS, Azure, or GCP ~ Familiarity...  ...~ Career growth as we scale across the US. Compensation: UVeye provides... 
    Hourly pay
    Afternoon shift

    UVeye

    San Francisco, CA
    5 days ago
  •  ...ROLE This is our second support hire. Customers using E2B in...  ...bridge between customers and the engineering team. You'll spend most...  .... The customers writing to us are technical, so the answer...  ...telemetry, and correlate with platform metrics to figure out where in... 
    Work from home

    E2B

    San Francisco, CA
    2 days ago
  • $125k - $150k

     ...future. About the Role At Sentry, Support is an engineering discipline. Our customers are the...  ...looking for a veteran engineer to help us redefine the standard of technical support...  ...systems to prioritize high-impact platform fixes. Engineer Agentic Operations... 
    Hourly pay
    Full time
    Night shift

    Sentry

    San Francisco, CA
    5 days ago
  • $90k - $125k

     ...Technical Support Engineer San Francisco, CA Sigma is growing rapidly, and our Technical Support...  ..., and data challenges using the Sigma platform. You'll work closely with Product,...  ...enthusiastically looking for people that will help us grow our company and sometimes we are... 
    Full time
    Work experience placement
    Work at office
    Flexible hours

    Sigma Computing

    San Francisco, CA
    1 day ago
  • $92.5k - $140.5k

     ...LiveRamp is the data collaboration platform of choice for the world's most...  ...wherever data lives to support the widest range of data collaboration...  ...of our Dedicated Support Engineer team, and work to solve...  ....S. LiveRampers) More about us: LiveRamp's mission is to connect... 
    Work from home
    Flexible hours
    Night shift
    Weekend work

    LiveRamp

    San Francisco, CA
    2 days ago
  •  ...Technical Support Engineer We started by building infrastructure to run CI workloads really fast...  ...same CI infrastructure into a broader platform: running agent sandboxes at scale and building...  ...offsite. Early-exercise stock options. 12 weeks fully paid parental leave (US).... 

    Blacksmith

    San Francisco, CA
    1 day ago
  •  ...Technical Support Engineer San Francisco, CA About Starburst Starburst delivers enterprise intelligence at scale by giving organizations...  ...thought, perspective, background and experience will enable us to own what we do, drive our success and empower our All-Stars... 
    Local area
    Flexible hours

    Starburst

    San Francisco, CA
    5 days ago
  • About the Role As a Customer Support Engineer at a pioneering AI company, you'll be the first line of defense to support customers as they...  ...benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000-230... 
    Full time
    Remote work
    Flexible hours
    Night shift
    Weekend work

    Together AI

    San Francisco, CA
    5 days ago
  • $96k - $140k

     ...happy and successful. Premier Technical Support Engineers (PSEs) are primarily focused on assisting...  ...educate our clients on the use of the platform Develop relationships with our Premier...  ...all users. If you would like to contact us regarding the accessibility of our website... 
    Work at office
    Immediate start
    Worldwide

    Datadog

    San Francisco, CA
    1 day ago
  • $210k - $250k

     ...group of committed researchers, engineers, policy experts, and business...  ...the role We are hiring Support Engineers to serve as the named...  ...customer engineer and an internal platform team without losing either...  ...team. Your safety matters to us. To protect yourself from... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    5 days ago
  • $110k - $165k

     ...rapidly and hiring their first Technical Support Engineer to build and own the support function...  ...become the expert on a deeply technical platform, act as the first responder when issues...  ...a standout fit and have not heard from us within a few days, please reach out to Oliver... 
    Full time
    Visa sponsorship

    Rec Gen

    San Francisco, CA
    4 days ago
  •  ...Support Engineer Zuma is pioneering the future of agentic AI and our focus is to transform the...  ...property manager alike. Our innovative platform is engineered from the ground up to boost...  ...property management business across the US and Canada, a ~$200B market. Off the... 
    Immediate start
    Shift work

    ZUMA

    San Francisco, CA
    22 days ago
  • $234k - $260k

     ...the Team The Technical Support team is responsible for ensuring...  ...Technical Success, Product, Engineering and others to deliver the...  ...being built with the OpenAI API platform. The nature of this role will...  ...Fair Chance Act, for US-based candidates. For unincorporated... 
    Work at office
    Relocation package
    Night shift
    Weekend work

    OpenAI

    San Francisco, CA
    2 days ago
  • $50k - $80k

     ...Join to apply for the Technical Support Engineer role at Wispr Flow Base pay range $50,0...  ...Wispr Flow is the first voice dictation platform people use more than their keyboards because...  ...and processes better. A bit about us We are a collection of international... 
    Night shift

    Wispr Flow

    San Francisco, CA
    1 day ago
  •  ...About us At Sierra, we're creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an...  ...product and design teams for Google Workspace. Support Engineering at Sierra Companies use Sierra's Agent OS to... 
    Full time
    Flexible hours

    Sierra

    San Francisco, CA
    4 days ago
  • $110k - $130k

     ...problem with a purpose-built procurement platform that provides a simple, consumer-grade...  ...incredible value for our customers. Join us! *This is a hybrid role in our San Francisco...  ...week. Your Role As a Senior Technical Support Engineer (TSE) on the Customer team, you play a... 
    Work at office
    Home office
    Flexible hours
    3 days per week

    ZIP

    San Francisco, CA
    8 days ago
  • $120k - $160k

     ...of defense. Fable is the human risk platform that directly shapes employee behavior....  ...security. The Role As Fable's first Support Engineer, you will own the end-to-end technical...  ...Security+, CCSP, CCSK, CISSP) Why Join Us? Competitive base + performance... 
    Work experience placement
    Flexible hours

    Fable

    San Francisco, CA
    2 days ago
  • $130k - $195k

     ...About Us At LangChain, our mission is to make intelligent...  ...the foundation for agent engineering in the real world, helping developers...  ...have grown to also offer a platform for building, evaluating,...  ...re hiring a Senior Technical Support Engineer to lead our customer... 
    Work at office
    Remote work
    Flexible hours

    LangChain, Inc

    San Francisco, CA
    2 days ago
  •  ...creative problem solving, helping us challenge the status quo and transform...  ...Wealth and Asset Management Engineering within Schwab Technology supports critical teams across asset management...  ...financial research, and third-party platforms. In this role, you'll help keep teammates... 
    Work at office

    Charles Schwab

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Platform Support Engineer (US). Be the first to apply!