Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Solutions Engineer (ML + Infrastructure Focus)

$150k - $195k

Lightning AI

Machine Learning Solutions Engineer (ML + Infrastructure Focus)

New York, New York, United States; San Francisco, California, United States; Seattle, Washington, United States

Who We Are

Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.

We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

What We're Looking For

Lightning is looking for a Machine Learning Solutions Engineer with a focus on ML and Infrastructure to join our Sales team in New York. As a Machine Learning Solutions Engineer, you will operate at the intersection of machine learning, distributed systems, and cloud infrastructure. You will partner with customers to design and deploy end-to-end AI systems, spanning:

  • Model development and training
  • GPU infrastructure and cluster design
  • Distributed inference and production deployment

This role goes beyond traditional ML solutions engineering—you will act as a technical architect, helping customers make critical decisions across compute, orchestration, and system design.

The role is hybrid out of one of our hub locations (New York City, San Francisco, Seattle) with an in-office requirement of at least 2 days per week and occasional team and company offsites. We are not able to provide visa sponsorship for this role at this time.

What You'll Do
Customer Architecture & Technical Leadership
  • Partner with customers to understand ML workloads, infrastructure constraints, and scaling requirements
  • Architect end-to-end solutions across:
    • Data pipelines (CPU → GPU workflows)
    • Distributed training (multi-node, multi-GPU)
    • High-throughput inference systems
  • Translate business goals (latency, cost, throughput) into technical system design decisions
GPU & Infrastructure Design
  • Design and optimize workloads across GPU clusters (H100, H200, B200, etc.)
  • Advise on:
    • Training vs inference cluster design
    • Interconnect choices (Ethernet vs Infiniband / RDMA vs Roce)
    • Storage strategies (local NVMe vs networked / object storage)
  • Model and optimize for:
    • Tokens/sec, tokens/$
    • Throughput vs latency tradeoffs
    • GPU utilization and scheduling efficiency
Kubernetes & Platform Systems
  • Design and support deployments on Kubernetes (EKS, GKE, on-prem clusters)
  • Work with:
    • GPU scheduling (time-slicing, MIG, bin-packing)
    • Autoscaling and workload orchestration
    • Helm-based deployments and multi-tenant environments
  • Help customers balance:
    • Raw Kubernetes flexibility vs platform abstraction (Lightning)
Demos, POCs, and Execution
  • Build and deliver technical demos and POCs that showcase:
    • Distributed training workflows
    • Scalable inference endpoints
    • End-to-end ML pipelines on Lightning AI
  • Scope and lead POCs aligned to customer success metrics (latency, cost, reliability)
Cross-Functional Impact
  • Act as the bridge between customers, product, and engineering
  • Provide feedback on:
    • Platform gaps in infrastructure, orchestration, and performance
    • Emerging patterns in GPU usage and distributed systems
  • Influence roadmap across ML workflows and infrastructure capabilities
Enablement & Thought Leadership
  • Create technical content
  • Architecture guides (e.g., high-throughput LLM inference systems)
  • Best practices for GPU utilization and scaling
  • Educate customers on modern AI infrastructure patterns
What You'll Need
ML + Systems Expertise
  • 3–6+ years experience in:
    • Machine Learning / AI Engineering
    • Solutions Engineering / Sales Engineering / ML Consulting
  • Strong understanding of:
    • Training vs inference workloads
    • Model optimization (quantization, batching, caching, etc.)
GPU & Distributed Systems
  • Experience working with:
    • GPU clusters (NVIDIA stack preferred)
    • Distributed training or inference systems
  • Familiarity with:
    • NCCL, CUDA, or GPU performance profiling
    • Networking concepts (RDMA, Roce, Infiniband, high-throughput systems)
Kubernetes & Cloud Platforms
  • Hands-on experience with:
    • Kubernetes (EKS, GKE, or on-prem)
    • Slurm
    • Containerization (Docker)
  • Exposure to:
    • GPU scheduling in Kubernetes environments
    • Multi-tenant or production ML deployments
Programming & Tooling
  • Strong Python skills (PyTorch preferred)
  • Experience building:
    • ML pipelines
    • APIs or inference services
  • Familiarity with Lightning AI, PyTorch Lightning, or similar frameworks is a plus
Customer-Facing Excellence
  • Ability to:
    • Explain complex infrastructure and ML tradeoffs clearly
    • Run technical discovery and uncover quantifiable success metrics
  • Experience working cross-functionally with:
    • Sales, product, and engineering teams
Compensation

The annual base pay range for this role is $150,000 - $195,000, in addition to a variable pay component and meaningful equity.

Benefits and Perks

We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role.

Benefits include:

  • Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
  • Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
  • Generous paid time off, plus holidays
  • Paid parental leave
  • Professional
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Machine Learning Solutions Engineer (ML + Infrastructure Focus) in San Francisco, CA vacancy
  •  ...Physics | 5 Days Onsite Machine Learning Infrastructure Engineer Location: Onsite in...  ...operating infrastructure for ML/compute-heavy workflows:...  ...when to invest in a robust solution, and you can justify the...  ...uninterrupted periods of focused work over meetings High... 
    Suggested
    Work at office
    Flexible hours
    1 day per week

    UniversalAGI

    San Francisco, CA
    2 days ago
  • $209.7k - $283.8k

     ...San Francisco, CA, USA Staff Machine Learning Engineer, ML Infrastructure Location San Francisco, CA, USA Department AI & Machine Learning...  ...and evolve the large-scale offline platform. This role focuses on building reliable infrastructure for generating... 
    Suggested
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    San Francisco, CA
    3 days ago
  • $160k - $250k

     ...Machine Learning, Platform Engineer San Francisco About the Role Our team focuses on enabling custom models and dedicated...  ...some sense of ML bottlenecks. Responsibilities...  ..., databases, and infrastructure Partner with...  ...and deliver solutions that meet business... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    2 days ago
  •  ...Title: Machine Learning Engineer Job Type: Contract Contract Length: 6 months...  ...project involves building ML capabilities for a user-...  ...optimization in production, with a focus on automating moderation...  ...needs into technical solutions. ~ Track record of using machine... 
    Suggested
    Contract work
    Immediate start
    Remote work

    DeWinter Group

    San Francisco, CA
    1 day ago
  •  ...Lightfield AI/ML Engineer Lightfield is an AI-native...  ...systems, Lightfield learns from how companies actually...  ...Today, the team is focused on building a...  ...product development infrastructure, focusing on scaling...  ...teams to integrate ML solutions and define best practices... 
    Suggested
    Work from home

    LIGHTFIELD INC

    San Francisco, CA
    2 days ago
  •  ...ML Infrastructure Engineer San Francisco, CA (On-Site M-F) Our client is a fast-growing, Series B AI startup building the infrastructure...  ...years of experience in ML infrastructure engineering, with a focus on model serving and training infrastructure ~ Hands-on experience... 

    RecruitSeq

    San Francisco, CA
    2 days ago
  •  ...Machine Learning Engineer In ML Runtime & Optimization Zensors is the spatial intelligence platform for the physical world. Our AI platform...  ...efficiently on both cloud and edge compute resources. The AI Infrastructure team at Zensors builds the engine that powers our... 
    Work at office

    Zensors

    San Francisco, CA
    5 days ago
  • $183.7k - $248.6k

     ...The opportunity Unity is looking for a Senior Machine Learning Infrastructure Engineer to join our Vector Ads team, where we build the real-time systems...  ...You'll build and operate the infrastructure that brings ML models from training into production, ensuring our... 
    Work at office
    Remote work
    Worldwide
    Relocation package

    UNITY

    San Francisco, CA
    2 days ago
  •  ...Labs Job Posting Build the infrastructure to serve personal AI models...  ...private, personal AI – one that learns your skills, judgment, and...  ...seeing your data. Our core ML systems challenge: how do we...  ...A deep understanding of the machine learning stack. You can dive... 
    Remote work
    Shift work

    Workshop Labs

    San Francisco, CA
    1 day ago
  • $225k - $325k

     ...Senior Machine Learning Engineer ABOUT THE ROLE This is a hands-on, high-ownership role for ML engineers who want to build production models...  ...iterations. Level Up Infrastructure – Design and maintain...  ...fast, take ownership, and focus on solving real problems... 
    H1b

    kadence

    San Francisco, CA
    4 days ago
  • $176k - $220k

     ...Machine Learning Research Engineer, GenAI Applied ML Ready to Apply? Join the team shaping the...  ...a high-impact, product-focused role where you’ll collaborate...  ...effective deployed solutions. If you’re excited...  ...production workflows and infrastructure Ideally you’d have:... 
    Full time

    Scale AI, Inc.

    San Francisco, CA
    9 hours ago
  • $150k - $220k

     ...Founding Machine Learning Engineer San Francisco Compensation ~ Estimated...  .... Unlike hosted browser solutions that introduce latency and auth barriers, or consumer-focused "AI browsers," we run AI...  ...architecture creates unique ML challenges. This is a high... 
    H1b
    Work at office
    Visa sponsorship
    Sleeping nights

    Composite.ai

    San Francisco, CA
    5 days ago
  • $131.4k - $235.95k

     ...for making buildings, machines, and even the latest...  ...Senior Machine Learning Engineer focused on Machine Learning...  ...build and operate the infrastructure that takes models...  ...deliver production-ready solutions, communicate status...  ...running production ML or LLM inference... 
    For contractors
    Remote work

    Autodesk

    San Francisco, CA
    1 day ago
  • $204k - $259k

     ...in 2009, Waymo has focused on building the...  ...the system which learns the spatial-temporal...  ...sensors, enabling engineers like you to (1) develop...  ...of our work is ML-related. Recently...  ...will: Apply machine learning techniques...  ...Alphabet's compute infrastructure, create methods... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    1 day ago
  • $140k - $265k

     ...Machine Learning Engineer, Search Quality Mountain View, CA About Glean: Glean...  ...gives organizations the infrastructure to govern, scale, and customize...  ...~ Proficiency in your ML framework of choice ~ Strong...  ...) ~ Thrive in a customer-focused, tight-knit and cross-functional... 
    Work at office
    Home office
    Flexible hours
    3 days per week

    Glean.info

    San Francisco, CA
    2 days ago
  • $172.5k - $306.63k

     ...mission is to employ machine learning to enhance our comprehension...  ...Machine Learning Engineer on the Content Intelligence...  ...the development of ML models and systems, to...  ...and AI to ensure our solutions remain at the...  ...our values and culture, focus on people, purpose and... 
    Temporary work
    Local area
    Worldwide

    Adobe

    San Francisco, CA
    4 days ago
  • $170k - $216k

     ...in 2009, Waymo has focused on building the...  ...the system which learns the spatial-temporal...  ...sensors, enabling engineers like you to (1) develop...  ...will: Apply machine learning...  ...Alphabet's compute infrastructure, create methods and...  ...~ Experience with ML frameworks like PyTorch... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    3 days ago
  •  ...operate like a high-performance team: fast, focused, and motivated by impact. We don’t...  ...The role We’re looking for a Machine Learning Engineer to build and ship consumer-facing AI systems...  ...you’ll contribute Build and deploy ML models that improve sleep experiences... 
    Full time
    Immediate start
    Worldwide
    Night shift

    Eight Sleep

    San Francisco, CA
    28 days ago
  • $150k - $265k

     ...Francisco Bay Area, is seeking a Machine Learning Engineer (Search Quality) to help...  .... In this role, you'll focus on improving ranking,...  ...combining LLMs with search infrastructure to answer nuanced and multi...  ...processing, or other large-scale ML-driven systems. ~... 

    ML6

    San Francisco, CA
    2 days ago
  •  ...persistent challenges in data infrastructure: extracting accurate,...  ...fast-growing team of engineers in San Francisco...  ...curation, and active learning pipelines Optimize...  ...~3+ years in applied ML or research, or strong...  ...discipline and metrics focus Nice to have... 
    Visa sponsorship
    Relocation package

    PULSE

    San Francisco, CA
    5 days ago
  • $160k - $250k

     ...Senior Machine Learning Engineer In order to execute our vision, we need to grow...  ...involved in applying a ML model to a production use case...  ...knowledgeable in at least one focus area of machine learning, such...  ...with ambiguity and scoping solutions with your teammates You... 

    Hive

    San Francisco, CA
    5 days ago
  •  ...schedules of billion-dollar infrastructure projects and improving...  ...veterans and world-class engineers to solve physical-world problems...  .... We're looking for a Machine Learning Engineer with a focus on behavior learning,...  ...teams to integrate ML models into real-world autonomous... 
    Work at office
    Flexible hours

    Bedrock Robotics

    San Francisco, CA
    4 days ago
  • $204k - $259k

     ...Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's...  ...validating the AV stack. We are an advanced ML and engineering team that leverages state-of-the-art computer vision, deep learning, and generative AI to automatically analyze... 
    Full time
    Remote work

    Waymo

    San Francisco, CA
    3 days ago
  • $150k - $190k

     ...simulation software stack for engineering and manufacturing...  ...Looking For As a Machine Learning Engineer in Delivery,...  .... You've shipped ML systems end-to-end and...  ...improving the systems and solutions you work on to ensure...  ...applications, with a focus on driving measurable... 
    Remote work
    Flexible hours

    PhysicsX

    San Francisco, CA
    1 day ago
  • $131.4k - $235.95k

     ...Growth Experience Technology Machine Learning Team (GET-ML) @Autodesk The GET-ML...  ...One of the team's major focus areas is the Commerce and...  ...members partner closely with ML engineers, MLOps, product managers,...  ...AI, search, multi-agent solutions, and intelligent... 
    For contractors
    Work experience placement
    Work at office
    Remote work

    Autodesk

    San Francisco, CA
    7 days ago
  • $204k - $259k

     ...Senior Machine Learning Engineer, Simulation Waymo is an autonomous driving...  ...Project in 2009, Waymo has focused on building the Waymo Driver...  ...collaborate across teams to bring ML to production systems and...  ...'s behavior. Develop ML infrastructure to support performant... 
    Work experience placement

    Waymo

    San Francisco, CA
    2 days ago
  • $200k - $260k

     ...Senior Machine Learning Engineer, Voice AI San Francisco About the Role...  ...building the best inference infrastructure for voice applications. Our...  ...We're looking for a Senior ML Engineer to drive the model...  ...require dedicated ML engineering focus. You'll shape how Together... 
    Full time

    Together AI

    San Francisco, CA
    4 days ago
  • $204k - $259k

     ...Project in 2009, Waymo has focused on building the Waymo Driver...  ...team is to develop machine learning solutions addressing open problems in...  ...to a Senior Staff Software Engineer. You will: Work with...  ...experience Experience in ML engineering and applied Deep... 
    Full time
    Temporary work
    Remote work

    Waymo

    San Francisco, CA
    5 days ago
  •  ...Machine Learning Engineer Title of Role: Machine Learning Engineer...  ...potential, building the infrastructure to scale their mission globally...  ...train, and productionize ML models focused on engagement prediction,...  ...needs into model-driven solutions. Adapt to a broad... 
    Internship
    Work at office
    Visa sponsorship

    Recruiting from Scratch

    San Francisco, CA
    4 days ago
  •  ...Seeking Founding Data Scientists and Machine Learning Engineers Imagine Multiplying Your Impact...  ...ll help extend those domains: building ML and AI models to detect and surface product...  ...now channeling that experience into a focused, user-obsessed startup so every... 
    Remote work

    Palladio AI, Inc

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Solutions Engineer (ML + Infrastructure Focus). Be the first to apply!