Platform Support Engineer (US)
$115k - $140kLightning AI
Platform Support Engineer (US)
San Francisco, California, United States; Seattle, Washington, United States
Who We Are
Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.
Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.
We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
What We're Looking For
Lightning AI is looking to hire a Platform Support Engineer to join our US Customer Experience team, supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments.
This role sits at the intersection of ML systems, cloud infrastructure, Kubernetes, and customers. You'll support engineers training models, deploying inference systems, and scaling GPU workloads in production. You are not a ticket router or traditional support engineer. You are a technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems.
The problems range from Kubernetes scheduling and GPU orchestration to distributed PyTorch failures, inference latency, networking bottlenecks, storage performance, and platform reliability. You'll gain exposure to a wide variety of real world AI workloads across industries and help shape the infrastructure powering the next generation of ML applications.
What You'll Do
Work Directly With ML Engineers
- Partner directly with customer engineering teams running training and inference workloads in production
- Help customers diagnose and resolve complex distributed systems and ML infrastructure issues
- Act as a technical advisor during high impact incidents and platform degradation events
- Translate infrastructure level issues into actionable guidance for ML engineers
- Build credibility with customers through strong technical reasoning and clear communication
Debug ML Infrastructure & Distributed Workloads
- Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems
- Troubleshoot PyTorch, CUDA, NCCL, and inference serving related issues
- Analyze logs, metrics, traces, and system behavior to isolate root causes
- Debug containerized workloads running across Kubernetes and bare metal GPU environments
- Support customers scaling workloads across multi node GPU systems
- Diagnose performance bottlenecks involving compute, memory, networking, or storage
Improve Reliability & Platform Operations
- Identify recurring patterns across customer issues and drive long term reliability improvements
- Contribute to post incident reviews and operational improvements
- Build internal tooling, automation, documentation, and runbooks
- Partner closely with infrastructure, networking, and platform engineering teams
- Help improve observability, operational visibility, and troubleshooting workflows
- Improve the customer experience through better processes and technical guidance
What This Role Is Not
To set clear expectations:
- This is not a traditional help desk or ticket routing support role
- This is not purely customer success or account management
- This is not a backend engineering role
- This is not a passive escalation position
This role is for engineers who enjoy solving difficult technical problems while working closely with other engineers.
What You'll Need
Required Qualifications
Infrastructure & Systems
- Strong software engineering and systems troubleshooting background
- Experience with Kubernetes and containerized environments
- Linux systems knowledge, including networking, storage, process management, and performance tuning
- Experience with cloud infrastructure and distributed systems
- Experience with observability and debugging tools such as Prometheus, Grafana, or OpenTelemetry
ML Infrastructure Experience
- Hands on experience operating machine learning workloads in production or research environments
- Experience with distributed ML systems and tooling such as PyTorch, CUDA, or NCCL
- Familiarity with GPU infrastructure and orchestration
- Experience troubleshooting performance, reliability, or scaling issues in ML infrastructure
- Understanding of the operational challenges involved in running ML systems at scale
Collaboration
- Strong communication skills and ability to work directly with highly technical customers and engineering teams
- Comfortable operating in fast moving, highly ambiguous environments
- Enjoys solving complex technical problems collaboratively
Ideal Experience
- Experience with large scale model training or distributed inference systems
- Familiarity with Ray, Kubeflow, Slurm, or similar distributed scheduling platforms
- Experience with InfiniBand, RDMA, or high-performance networking
- Experience operating bare metal infrastructure
- Familiarity with storage systems commonly used in ML environments
- Experience working at an AI infrastructure, cloud, MLOps, or developer tooling company
- Contributions to platform engineering, developer infrastructure, or operational tooling projects
- Experience writing automation, tooling, or scripts in Python or similar languages
This role is hybrid out of our Seattle or San Francisco offices, with an in-office requirement of at least 2 days per week and occasional team and company offsites. The role follows a Monday–Friday schedule, with working hours from 8:00 AM to 5:00 PM PST. We are not able to provide visa sponsorship for this role at this time.
We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits.
The anticipated annual base salary range for this role is:
$115,000 - $140,000 USD
Benefits and Perks
We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role.
Benefits include:
- Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
- Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
- Generous paid time off, plus holidays
- Paid parental leave
- Professional development support
- Wellness and work-from-home stipends
- Flexible work environment
At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.
- ...Enterprises LLC is looking for a qualified candidate to support operational reliability of productivity platforms, ensuring high availability and performance. This... ...in regulated environments is preferred. Join us to help deliver high-speed broadband solutions worldwide...SuggestedWorldwide
$80.9k - $122.3k
...of AI, and you are the future of Salesforce. Technical Support Engineer - GVC Cloud (US Citizen Only) These roles have Government... ...future of business with AI + Data + CRM. Through our #1 CRM platform, Customer 360, we help organizations across every industry...SuggestedWork at officeLocal areaShift work- Database Support Engineer - US West TigerData, formerly Timescale, is hiring a Database Support Engineer for our global, remote‑first team.... ...TigerData, formerly Timescale, is building the fastest PostgreSQL platform for modern workloads. Trusted by more than 2,000 customers,...SuggestedRemote workFlexible hours
- ...is an orbital energy grid intelligence platform. It is designed to ingest satellite... ...platform — and we need an exceptional engineer to help us build the MVP. Engagement Details... ...environment. Architect RESTful endpoints to support the platform’s core intelligence...SuggestedFull timeContract workFor contractorsRemote work
- Technical Support Engineer - GVC Cloud (US Citizen Only) U.S. Citizenship required. We’re Salesforce, the Customer Company, inspiring the future of business with AI + Data + CRM. Through our #1 CRM platform, Customer 360, we help organizations across every industry transform...SuggestedWork at officeShift work
- B Capital is seeking a Technical Support Engineer to provide excellent customer experiences through effective problem-solving and support for Salesforce technology. This role requires U.S. Citizenship and 2+ years of technical support experience. The ideal candidate will...
- ...pipelines. Drive defect triage, risk assessment, and Go/No Go decisions for releases and pilots. Collaborate with Marketing, Data, Engineering, and Product teams to ensure business aligned quality. Provide quality metrics, dashboards, and test insights to leadership and...
- ...Tel.: (***) ***-**** Ext 13578 ****@*****.*** Location: Onsite Seattle, WA Role: UAT Engineer JOB DESCRIPTION "Required Skills: - Experience: 4+ years of experience in UAT or QA roles - Technical...Local areaRemote workRelocation
$85k - $95k
...ultimately, safer nations. Connect with a career that matters, and help us build a safer future. Department OverviewThe Software... ...respond faster with smarter and safer decisions. We deploy and support products such as Emergency Call Handling, 911 Equipment, Computer...Remote workRelocation$60k - $85k
A nonprofit-focused financial support organization is hiring a remote technical support specialist to assist community lenders across the US. The ideal candidate should possess strong communication skills, technical proficiency, and experience in customer support. Responsibilities...Part timeRemote work$210k - $250k
...Support Engineer San Francisco, CA | New York City, NY | Seattle, WA About Anthropic Anthropic... ...customer engineer and an internal platform team without losing either Are comfortable... ...in this work. Your safety matters to us. To protect yourself from potential...Work at officeVisa sponsorshipFlexible hours$68.4k - $90k
...company that values your contributions and supports your growth, we would like to meet you.... ...Experience with ticketing and incident management platforms, for managing user requests and... ...-disclosure. What You Can Expect from Us Our dedication to the Employee Experience...Permanent employmentTemporary workWork at officeRemote workFlexible hoursWeekend work$60k - $85k
...seeks to fill a full-time, remote, technical support role. Candidates should be available to... ...nonprofit organizations around the US and the world that make social-good loans... ...thorough working knowledge of the software platform and our customer base of community lenders...Full timePart timeLive inWork at officeRemote workHome officeFlexible hoursDay shift$40 - $45 per hour
...IT Support Specialist Payrate: $40.00- $45.00/hr. The primary... ...Partner with remote IT, Security, Engineering, and Workplace teams to... ...identity and access management platforms such as Entra ID (Azure AD),... ...telephone number(s) you provided to us belong to you and that you...Hourly payFull timeWork at officeLocal areaRemote workFlexible hours$122.3k - $158.5k
...technical expertise, we use cutting-edge engineering, automation, and intelligence to tackle... ...securing game services, combating fraud, or supporting fair play, we're at the frontline of... ...future of interactive entertainment. Join us and help keep the world of play - and EA...Full timeLocal areaRemote work$86.62k - $129.88k
...Support Engineer Kirkland, WA Radar Reinvented. Echodyne offers the world's first compact solid-state true beam-steering radar for... ...offer competitive compensation and benefits to our full-time, US-based employees, including: RSU (Restricted Stock Units)...Full timeTemporary workInternshipWork at officeFlexible hours$83k - $132k
...Bare Metal Support Engineer Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA CoreWeave... ...by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables... ...want to learn from you, too. Come join us! The base salary range for this role...Permanent employmentTemporary workCasual workWork at officeRemote workFlexible hoursShift work$40.9k
...AV Support Engineer The AV Support Engineer will be responsible for providing in person service and support to all Amazonians utilizing AV... ...Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from...For contractorsLocal areaFlexible hours- ...Support Engineer The client is a leader in ICT research and development. It develops optical networking; wireless networking, broadband access... ...archiving process The successful candidate will help us establish a set of practices designed to optimize the management...
$170k - $230k
...group of committed researchers, engineers, policy experts, and business... .... We're seeking an IT Support Engineer who combines deep technical... ...improvements that help us scale. You'll work closely with... ..., Android, and our core SaaS platforms (Google Workspace, Slack, GitHub...Work at officeImmediate startVisa sponsorshipFlexible hours$92.5k - $140.5k
...LiveRamp is the data collaboration platform of choice for the world's most... ...wherever data lives to support the widest range of data collaboration... ...of our Dedicated Support Engineer team, and work to solve... ....S. LiveRampers) More about us: LiveRamp's mission is to connect...Work from homeFlexible hoursNight shiftWeekend work$130k - $195k
...A leading AI solutions company is seeking a Technical Support Engineer to enhance customer support for technical users, including AI engineers and infrastructure architects. You will help debug complex production applications, collaborate with engineering teams, and develop...Remote work$85k - $95k
...infrastructure management for small‑to‑mid‑sized businesses. As we continue to expand our services, we are eager to find a Tier 3 Support Engineer who embodies our core values: Teamwork: We achieve our goals collaboratively. Growth Mindset: We are focused on continuous...Remote jobWork at officeImmediate start$68.4k - $90k
...company that values your contributions and supports your growth, we would like to meet you.... ...Experience with ticketing and incident management platforms, for managing user requests and... ...non-disclosure. What You Can Expect From Us Our dedication to the Employee Experience...Permanent employmentTemporary workWork at officeRemote workFlexible hoursWeekend work$65k - $85k
...Location: Seattle, WA (onsite 5x/week) About Us Thrive is an innovative technology... ...Managed Services. Our corporate culture, engineering talent, customer-centric approach, and focus... ...printer problems; work with next level support to resolve complex issues; conduct hardware...Work at officeLocal areaRemote workWeekday work- ...Senior Platform/DevOps Engineer (Kubernetes-Linux) Bellevue Office, Sunset Corporate Campus Armada... ...brilliant minds in the world to join us. Working at Armada means taking... ...operation of our Kubernetes-based platform supporting our Galleon mobile data centers and...Work at officeLocal areaFlexible hours
- ...enhance team efficiency. The ideal candidate has a passion for accessibility, strong technical skills in HTML, CSS, and JavaScript, and experience with assistive technologies. Join us to make a meaningful impact in our customer success journey. #J-18808-Ljbffr Centaur Labs
$137k - $205.6k
Metronome is seeking a Technical Support Engineer to provide customer service and support for their billing platform. You will handle customer escalations, troubleshoot issues, and develop internal tools to automate workflows. The ideal candidate should have at least 2...- ...customer relationship management firm is seeking a Technical Support Engineer in Seattle, WA. This role demands U.S. Citizenship and involves providing exceptional customer support for the Salesforce platform. Responsibilities include leading technical troubleshooting,...
$106.61k - $284.28k
...and quality in everything we do. Join us and be part of something bigger - helping... ...CVS Health as a Sr. Manager, Frontline Support Engineering to lead our organization's efforts to... ...resolution time). Experience with support platforms (e.g., ServiceNow, Zendesk, Salesforce...Hourly payFull timeTemporary workWork experience placementLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Platform Support Engineer (US). Be the first to apply!
- platform engineering manager Seattle, WA
- platform engineer Seattle, WA
- client platform engineer Seattle, WA
- platform developer Seattle, WA
- data platform engineer Seattle, WA
- senior platform engineer Seattle, WA
- customer support engineer Seattle, WA
- application support engineer Seattle, WA
- cloud support engineer Seattle, WA
- software technical support engineer Seattle, WA

