Platform Support Engineer (US)
$115k - $140kLightning AI
Who We Are Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems-designed to take ideas from research to production with less friction. Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in. We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute. What We're Looking For Lightning AI is looking to hire a Platform Support Engineer to join our US Customer Experience team, supporting ML engineers running large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms in production environments. This role sits at the intersection of ML systems, cloud infrastructure, Kubernetes, and customers. You'll support engineers training models, deploying inference systems, and scaling GPU workloads in production.You are not a ticket router or traditional support engineer. You are a technical partner to ML teams - helping diagnose failures, improve reliability, and guide customers through complex distributed systems problems. The problems range from Kubernetes scheduling and GPU orchestration to distributed PyTorch failures, inference latency, networking bottlenecks, storage performance, and platform reliability. You'll gain exposure to a wide variety of real world AI workloads across industries and help shape the infrastructure powering the next generation of ML applications.
What You'll Do Work Directly With ML Engineers
Required Qualifications
Infrastructure & Systems
We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits. The anticipated annual base salary range for this role is: $115,000-$140,000 USD Benefits and Perks We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role. Benefits include:
What You'll Do Work Directly With ML Engineers
- Partner directly with customer engineering teams running training and inference workloads in production
- Help customers diagnose and resolve complex distributed systems and ML infrastructure issues
- Act as a technical advisor during high impact incidents and platform degradation events
- Translate infrastructure level issues into actionable guidance for ML engineers
- Build credibility with customers through strong technical reasoning and clear communication
- Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems
- Troubleshoot PyTorch, CUDA, NCCL, and inference serving related issues
- Analyze logs, metrics, traces, and system behavior to isolate root causes
- Debug containerized workloads running across Kubernetes and bare metal GPU environments
- Support customers scaling workloads across multi node GPU systems
- Diagnose performance bottlenecks involving compute, memory, networking, or storage
- Identify recurring patterns across customer issues and drive long term reliability improvements
- Contribute to post incident reviews and operational improvements
- Build internal tooling, automation, documentation, and runbooks
- Partner closely with infrastructure, networking, and platform engineering teams
- Help improve observability, operational visibility, and troubleshooting workflows
- Improve the customer experience through better processes and technical guidance
- This is not a traditional help desk or ticket routing support role
- This is not purely customer success or account management
- This is not a backend engineering role
- This is not a passive escalation position
Required Qualifications
Infrastructure & Systems
- Strong software engineering and systems troubleshooting background
- Experience with Kubernetes and containerized environments
- Linux systems knowledge, including networking, storage, process management, and performance tuning
- Experience with cloud infrastructure and distributed systems
- Experience with observability and debugging tools such as Prometheus, Grafana, or OpenTelemetry
- Hands on experience operating machine learning workloads in production or research environments
- Experience with distributed ML systems and tooling such as PyTorch, CUDA, or NCCL
- Familiarity with GPU infrastructure and orchestration
- Experience troubleshooting performance, reliability, or scaling issues in ML infrastructure
- Understanding of the operational challenges involved in running ML systems at scale
- Strong communication skills and ability to work directly with highly technical customers and engineering teams
- Comfortable operating in fast moving, highly ambiguous environments
- Enjoys solving complex technical problems collaboratively
- Experience with large scale model training or distributed inference systems
- Familiarity with Ray, Kubeflow, Slurm, or similar distributed scheduling platforms
- Experience with InfiniBand, RDMA, or high-performance networking
- Experience operating bare metal infrastructure
- Familiarity with storage systems commonly used in ML environments
- Experience working at an AI infrastructure, cloud, MLOps, or developer tooling company
- Contributions to platform engineering, developer infrastructure, or operational tooling projects
- Experience writing automation, tooling, or scripts in Python or similar languages
We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits. The anticipated annual base salary range for this role is: $115,000-$140,000 USD Benefits and Perks We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role. Benefits include:
- Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
- Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
- Generous paid time off, plus holidays
- Paid parental leave
- Professional development support
- Wellness and work-from-home stipends
- Flexible work environment
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Platform Support Engineer (US) in San Francisco, CA vacancy
- ...About Us Socket helps devs and security teams ship faster by cutting out security busywork. Thousands of orgs use Socket... ...leaders. About the Role We're looking for a Technical Support Engineer to join as Socket's first dedicated support hire in the US. You...SuggestedRemote workFlexible hours
$195.3k - $199.2k
Technology & Digital Platform Full-Stack Engineer - US Defense / Public Sector Job ID: 106486 Boston Chicago New York City San Francisco... ...Later Do you want to do work that matters, alongside supportive leaders who will help you grow faster than you ever...SuggestedHourly payApprenticeshipWork at officeEasy work$100k - $170k
...run it. Dust is the multiplayer AI platform for human-agent collaboration. It gives... ...seriously while doing so. The Generalist named us among the Future 50. About the Role... ...even do this the same way?" The AI Support Engineer applies this mindset to Support. You'll...SuggestedWork at officeImmediate startWork from homeFlexible hours$94k - $118k
...employment Visa sponsorship. Overall Purpose TheSr. Data Platform Support Engineerserves as the technical owner and administrator for... ...providing expert-level application and system support. The engineer will design, implement, and maintain integrations across enterprise...SuggestedHourly payWork at officeImmediate startVisa sponsorshipWork visaFlexible hours$85k - $95k
...ultimately, safer nations. Connect with a career that matters, and help us build a safer future. Department OverviewThe Software... ...respond faster with smarter and safer decisions. We deploy and support products such as Emergency Call Handling, 911 Equipment, Computer...SuggestedRemote workRelocation- ...robotics technology? Join our team at OSARO as a Technical Support Engineer and help us develop cutting-edge AI-based autonomous industrial... ..., system, and application logs, as well as the tools and platforms used for their management and analysis. ~ Basic understanding...Full timeImmediate startRemote workFlexible hours
$120k - $160k
...Job Description Job Description Customer Support Engineer Metriport is an open-source data intelligence platform that helps healthcare organizations access and exchange... ...data in real-time. We integrate with all major US healthcare IT systems and tap into comprehensive...Work at officeWork from homeFlexible hours- ...Rootshell Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking IT Support Engineer for one of our client, Please share your resume with current location & full contact info Role:IT Support...Work at officeFlexible hoursAfternoon shiftEarly shift
$137k - $205.6k
...Technical Support Engineer New York City; San Francisco Bay Area About Us Metronome is the leading usage-based billing platform built for modern software companies. With Metronome, companies can launch products faster, offer any pricing model, and streamline...Work experience placement$170k - $230k
...IT Support Engineer Seattle, WA About Anthropic Anthropic's mission is to create reliable... ...the operational improvements that help us scale. You'll work closely with IT Engineering... ..., iOS, Android, and our core SaaS platforms (Google Workspace, Slack, GitHub, Atlassian...Work at officeImmediate startVisa sponsorship- ...Platform/DevOps Engineer Unto Labs is a team of engineers pushing distributed systems to their physical... ...Our infrastructure is fast-evolving, supporting distributed node architectures across... ...(C, Rust, TypeScript) Why Join Us? Foundational platform role to...Work at officeLocal areaFlexible hours
$36 - $43 per hour
...We're looking for a Technical Customer Support Engineer Tier II to join our Support team. You'll... ...first mindset ~ Experience with cloud platforms like AWS, Azure, or GCP ~ Familiarity... ...~ Career growth as we scale across the US. Compensation: UVeye provides...Hourly payAfternoon shift- ...ROLE This is our second support hire. Customers using E2B in... ...bridge between customers and the engineering team. You'll spend most... .... The customers writing to us are technical, so the answer... ...telemetry, and correlate with platform metrics to figure out where in...Work from home
$125k - $150k
...future. About the Role At Sentry, Support is an engineering discipline. Our customers are the... ...looking for a veteran engineer to help us redefine the standard of technical support... ...systems to prioritize high-impact platform fixes. Engineer Agentic Operations...Hourly payFull timeNight shift$90k - $125k
...Technical Support Engineer San Francisco, CA Sigma is growing rapidly, and our Technical Support... ..., and data challenges using the Sigma platform. You'll work closely with Product,... ...enthusiastically looking for people that will help us grow our company and sometimes we are...Full timeWork experience placementWork at officeFlexible hours$92.5k - $140.5k
...LiveRamp is the data collaboration platform of choice for the world's most... ...wherever data lives to support the widest range of data collaboration... ...of our Dedicated Support Engineer team, and work to solve... ....S. LiveRampers) More about us: LiveRamp's mission is to connect...Work from homeFlexible hoursNight shiftWeekend work- ...Technical Support Engineer We started by building infrastructure to run CI workloads really fast... ...same CI infrastructure into a broader platform: running agent sandboxes at scale and building... ...offsite. Early-exercise stock options. 12 weeks fully paid parental leave (US)....
- ...Technical Support Engineer San Francisco, CA About Starburst Starburst delivers enterprise intelligence at scale by giving organizations... ...thought, perspective, background and experience will enable us to own what we do, drive our success and empower our All-Stars...Local areaFlexible hours
- About the Role As a Customer Support Engineer at a pioneering AI company, you'll be the first line of defense to support customers as they... ...benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000-230...Full timeRemote workFlexible hoursNight shiftWeekend work
$96k - $140k
...happy and successful. Premier Technical Support Engineers (PSEs) are primarily focused on assisting... ...educate our clients on the use of the platform Develop relationships with our Premier... ...all users. If you would like to contact us regarding the accessibility of our website...Work at officeImmediate startWorldwide$210k - $250k
...group of committed researchers, engineers, policy experts, and business... ...the role We are hiring Support Engineers to serve as the named... ...customer engineer and an internal platform team without losing either... ...team. Your safety matters to us. To protect yourself from...Work at officeVisa sponsorshipFlexible hours$110k - $165k
...rapidly and hiring their first Technical Support Engineer to build and own the support function... ...become the expert on a deeply technical platform, act as the first responder when issues... ...a standout fit and have not heard from us within a few days, please reach out to Oliver...Full timeVisa sponsorship- ...Support Engineer Zuma is pioneering the future of agentic AI and our focus is to transform the... ...property manager alike. Our innovative platform is engineered from the ground up to boost... ...property management business across the US and Canada, a ~$200B market. Off the...Immediate startShift work
$234k - $260k
...the Team The Technical Support team is responsible for ensuring... ...Technical Success, Product, Engineering and others to deliver the... ...being built with the OpenAI API platform. The nature of this role will... ...Fair Chance Act, for US-based candidates. For unincorporated...Work at officeRelocation packageNight shiftWeekend work$50k - $80k
...Join to apply for the Technical Support Engineer role at Wispr Flow Base pay range $50,0... ...Wispr Flow is the first voice dictation platform people use more than their keyboards because... ...and processes better. A bit about us We are a collection of international...Night shift- ...About us At Sierra, we're creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an... ...product and design teams for Google Workspace. Support Engineering at Sierra Companies use Sierra's Agent OS to...Full timeFlexible hours
$110k - $130k
...problem with a purpose-built procurement platform that provides a simple, consumer-grade... ...incredible value for our customers. Join us! *This is a hybrid role in our San Francisco... ...week. Your Role As a Senior Technical Support Engineer (TSE) on the Customer team, you play a...Work at officeHome officeFlexible hours3 days per week$120k - $160k
...of defense. Fable is the human risk platform that directly shapes employee behavior.... ...security. The Role As Fable's first Support Engineer, you will own the end-to-end technical... ...Security+, CCSP, CCSK, CISSP) Why Join Us? Competitive base + performance...Work experience placementFlexible hours$130k - $195k
...About Us At LangChain, our mission is to make intelligent... ...the foundation for agent engineering in the real world, helping developers... ...have grown to also offer a platform for building, evaluating,... ...re hiring a Senior Technical Support Engineer to lead our customer...Work at officeRemote workFlexible hours- ...creative problem solving, helping us challenge the status quo and transform... ...Wealth and Asset Management Engineering within Schwab Technology supports critical teams across asset management... ...financial research, and third-party platforms. In this role, you'll help keep teammates...Work at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Platform Support Engineer (US). Be the first to apply!
Related searches
- client platform engineer San Francisco, CA
- platform engineer San Francisco, CA
- senior platform engineer San Francisco, CA
- platform engineering manager San Francisco, CA
- data platform engineer San Francisco, CA
- platform developer San Francisco, CA
- IT software developer San Francisco, CA
- junior application support engineer San Francisco, CA
- support engineer San Francisco, CA
- IT developer San Francisco, CA


