Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Solutions Engineer (ML + Infrastructure Focus)

$150k - $195k

Lightning AI

Who We Are

Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems-designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.

We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

What We're Looking For

Lightning is looking for a Machine Learning Solutions Engineer with a focus on ML and Infrastructure to join ou Sales team in New York. As a Machine Learning Solutions Engineer, you will operate at the intersection of machine learning, distributed systems, and cloud infrastructure. You will partner with customers to design and deploy end-to-end AI systems, spanning:
  • Model development and training
  • GPU infrastructure and cluster design
  • Distributed inference and production deployment
This role goes beyond traditional ML solutions engineering-you will act as a technical architect, helping customers make critical decisions across compute, orchestration, and system design.

The role is hybrid out of our New York City office hub, with an in-office requirement of at least 3 days per week and occasional team and company offsites. We are not able to provide visa sponsorship for this role at this time.

What You'll Do
Customer Architecture & Technical Leadership
  • Partner with customers to understand ML workloads, infrastructure constraints, and scaling requirements
  • Architect end-to-end solutions across:
    • Data pipelines (CPU → GPU workflows)
    • Distributed training (multi-node, multi-GPU)
    • High-throughput inference systems
  • Translate business goals (latency, cost, throughput) into technical system design decisions
GPU & Infrastructure Design
  • Design and optimize workloads across GPU clusters (H100, H200, B200, etc.)
  • Advise on:
    • Training vs inference cluster design
    • Interconnect choices (Ethernet vs Infiniband / RDMA vs Roce)
    • Storage strategies (local NVMe vs networked / object storage)
  • Model and optimize for:
    • Tokens/sec, tokens/$
    • Throughput vs latency tradeoffs
    • GPU utilization and scheduling efficiency
Kubernetes & Platform Systems
  • Design and support deployments on Kubernetes (EKS, GKE, on-prem clusters)
  • Work with:
    • GPU scheduling (time-slicing, MIG, bin-packing)
    • Autoscaling and workload orchestration
    • Helm-based deployments and multi-tenant environments
  • Help customers balance:
    • Raw Kubernetes flexibility vs platform abstraction (Lightning)
Demos, POCs, and Execution
  • Build and deliver technical demos and POCs that showcase:
    • Distributed training workflows
    • Scalable inference endpoints
    • End-to-end ML pipelines on Lightning AI
  • Scope and lead POCs aligned to customer success metrics (latency, cost, reliability)
Cross-Functional Impact
  • Act as the bridge between customers, product, and engineering
  • Provide feedback on:
    • Platform gaps in infrastructure, orchestration, and performance
    • Emerging patterns in GPU usage and distributed systems
  • Influence roadmap across ML workflows and infrastructure capabilities
Enablement & Thought Leadership
  • Create technical content
  • Architecture guides (e.g., high-throughput LLM inference systems)
  • Best practices for GPU utilization and scaling
  • Educate customers on modern AI infrastructure patterns
What You'll Need
ML + Systems Expertise
  • 3-6+ years experience in:
    • Machine Learning / AI Engineering
    • Solutions Engineering / Sales Engineering / ML Consulting
  • Strong understanding of:
    • Training vs inference workloads
    • Model optimization (quantization, batching, caching, etc.)
GPU & Distributed Systems
  • Experience working with:
    • GPU clusters (NVIDIA stack preferred)
    • Distributed training or inference systems
  • Familiarity with:
    • NCCL, CUDA, or GPU performance profiling
    • Networking concepts (RDMA, Roce, Infiniband, high-throughput systems)
Kubernetes & Cloud Platforms
  • Hands-on experience with:
    • Kubernetes (EKS, GKE, or on-prem)
    • Slurm
    • Containerization (Docker)
  • Exposure to:
    • GPU scheduling in Kubernetes environments
    • Multi-tenant or production ML deployments
Programming & Tooling
  • Strong Python skills (PyTorch preferred)
  • Experience building:
    • ML pipelines
    • APIs or inference services
  • Familiarity with Lightning AI, PyTorch Lightning, or similar frameworks is a plus
Customer-Facing Excellence
  • Ability to:
    • Explain complex infrastructure and ML tradeoffs clearly
    • Run technical discovery and uncover quantifiable success metrics
  • Experience working cross-functionally with:
    • Sales, product, and engineering teams
Compensation

The annual base pay range for this role is $150,000 - $195,000, in addition to a variable pay component and meaningful equity.


Benefits and Perks

We offer a comprehensive and competitive benefits package designed to support our employees' health, well-being, and long-term success. Benefits may vary by location, team, and role.

Benefits include:
  • Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
  • Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
  • Generous paid time off, plus holidays
  • Paid parental leave
  • Professional development support
  • Wellness and work-from-home stipends
  • Flexible work environment

At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Machine Learning Solutions Engineer (ML + Infrastructure Focus) in New York, NY vacancy
  • $209.7k - $283.8k

     ...New York, NY, USA Staff Machine Learning Engineer, ML Infrastructure Location New York, NY, USA Department AI & Machine Learning Requisition...  ...evolve the large-scale offline platform. This role focuses on building reliable infrastructure for generating... 
    Suggested
    Work at office
    Worldwide
    Relocation package

    Unity Technologies

    New York, NY
    4 days ago
  •  ...Blue Rock Management India is looking for a Machine Learning Engineer focused on software engineering. The role emphasizes backend development, API creation, and testing strategies over model training. Ideal candidates will have proven skills in setting up environments... 
    Suggested
    Shift work

    Blue Rock Management India

    New York, NY
    2 days ago
  • $210k - $260k

     ...Lead Machine Learning Engineer - Infrastructure Boston or NYC Layer Health was founded in 2023 by leading machine...  .... We're hiring an exceptional ML Infra Engineer to join our team (...  ...alongside an awesome team, in a customer-focused and fast-paced environment. ~ We... 
    Suggested
    Work at office

    Layer Health

    New York, NY
    7 hours ago
  •  ...Generative AI Supergroup, focused on building the...  ...for developing scalable infrastructure and tooling that powers machine learning across Canva. Our Inference...  ...this mission—ensuring that ML models are deployed, served...  ...: As a Machine Learning Engineer, you’ll focus on... 
    Suggested
    Work at office
    Remote work
    Flexible hours

    Canva

    New York, NY
    2 days ago
  •  ...currently looking for a Senior Machine Learning Engineer - Training Platform in...  ...high-impact AI Platform group focused on building the...  ...will design and evolve the infrastructure that enables distributed AI...  ...support research scientists, ML engineers, and product teams... 
    Suggested
    Remote work
    Flexible hours

    Jobgether

    New York, NY
    2 days ago
  •  ...financial wellness solutions to nearly 500...  ...and continuous learning, and we seek out...  ...Role: Engine by Gen is a leader...  ..., and maintain infrastructure supporting ML training,...  ...systems with a strong focus on reliability,...  ...Experience with machine learning infrastructure... 
    Flexible hours

    Gen Digital Inc

    New York, NY
    1 day ago
  • $266k - $372.4k

     ...quickly and accurately. With a focus on building robust search infrastructure, we prioritize reliability and performance...  ...closely with a team of machine learning engineers and backend architects to develop...  ...with infrastructure, and ML teams to ensure the search platform... 
    For contractors
    Work experience placement
    Flexible hours

    GrabJobs

    New York, NY
    7 hours ago
  • $229.9k - $262.4k

    Sr. Lead, Machine Learning Engineer (Enterprise Platforms Technology...  ...technology platforms. You’ll focus on machine learning...  ...building technology solutions across the company,...  ...to perform many ML engineering activities...  ...teams. Inform your ML infrastructure decisions using your... 
    Full time
    Part time
    Internship
    Local area

    Capital One

    New York, NY
    5 days ago
  • $197.3k - $225.1k

     ...Lead Machine Learning Engineer (Enterprise Platforms Technology)...  ...technology platforms. You'll focus on machine learning...  ...to perform many ML engineering activities...  ...teams. Inform your ML infrastructure decisions using your...  ...data-intensive solutions using distributed computing... 
    Full time
    Part time
    Internship
    H1b
    Local area

    Capital One

    New York, NY
    4 days ago
  • $180k - $220k

     ...on a global scale. As a Machine Learning Engineer at Artera, you’ll work on...  ...AI Platform team with a focus on establishing scalable...  ...Accountable for Artera’s ML compute infrastructure including scaling up Artera...  ...develop new and innovative solutions for patients and... 
    Permanent employment
    Remote work
    Visa sponsorship
    Work visa

    Artera Corporation

    New York, NY
    2 days ago
  •  ...experiment, adapt, think on your feet, and learn constantly, or if you’re seeking...  ...of the hardest problems in AI. As an ML Ops Infrastructure Engineer at Deepgram, you will own the...  ...or infrastructure engineering with a focus on ML systems Strong proficiency in Python... 
    Home office
    Flexible hours

    Deepgram

    New York, NY
    2 days ago
  • $216.7k - $303.4k

     ...Senior Machine Learning Systems Engineer Remote - United States Reddit is a community...  ...impact team that owns the infrastructure that powers recommendations...  ...What You’ll Do: As a Senior ML Infrastructure Engineer,...  ...Ray and Kubernetes Strong focus on scalability, reliability... 
    For contractors
    Work experience placement
    Remote work

    Reddit

    New York, NY
    2 days ago
  •  ...Data/AI Cloud Engineer We are seeking a skilled...  ...AWS data and AI/ML services, building ETL...  ...developing intelligent data solutions that empower our...  ...data engineering, machine learning infrastructure, and AI-driven...  ...with at least 2+ years focused on AI/ML, data engineering... 

    Innovative Solutions

    New York, NY
    1 day ago
  • $139k - $180k

     ...Overview We are looking for a Senior MLOps Engineer (IC3) to join the Platform Engineering...  ...will bridge the gap between traditional ML serving and autonomous agency. You will...  ...production runtime for agentic systems, focusing on low-latency tool-use and state management... 
    Shift work

    Paramount

    New York, NY
    1 day ago
  • $139k - $180k

     ...Overview We are looking for a Senior MLOps Engineer (IC3) to join the Platform Engineering...  ...will bridge the gap between traditional ML serving and autonomous agency. You will lead...  ...production runtime for agentic systems, focusing on low-latency tool-use and state... 
    Shift work

    Paramount

    New York, NY
    1 day ago
  • $197.3k - $225.1k

     ...Lead Machine Learning Engineer (Gen AI, Python, Go, AWS) As a Capital One Machine Learning...  ...leveraging cloud-native platforms. You'll focus on building robust ML serving architecture, developing...  ...low latency of our Generative AI solutions. You will collaborate closely with... 
    Full time
    Part time
    Internship
    H1b
    Local area

    Capital One

    New York, NY
    2 days ago
  • $128k - $160k

     ...looking for a Senior Machine Learning Engineer to drive personalization...  ...(statistical or ML), and other advanced...  ...algorithms. You will focus on improving the relevance...  ...proprietary AI/ML solutions that reflect our unique...  ...and platform infrastructure, and maintaining version... 
    Work experience placement
    Local area

    GOAT Group

    New York, NY
    2 days ago
  • $148.7k - $199.4k

     ...global organization of engineers, product developers, designers...  ...product mindset and a focus on usability, we ensure every ML-driven product enhances...  ...and Data teams to apply machine learning methods to meet...  ...in engineering big-data solutions using technologies like... 
    Work at office

    The Walt Disney Company

    New York, NY
    2 days ago
  •  ...Overview Function: Engineering, R&D → Data Science / Machine Learning / Operations Research About PulsePoint: PulsePoint...  ...Machine Learning Engineer, AdTech will focus on optimizing real-time bidding...  ...its properties; Bayesian inference) ML & DS (e.g., dimensionality reduction... 
    For contractors
    Remote work

    PulsePoint

    New York, NY
    2 days ago
  • $150k - $250k

     ...intelligence collection enable engineering, safety, and security teams...  ...re looking for an experienced ML engineer with a strong...  ...real-world ML systems with a focus on robustness, observability,...  ...professional working experience as a Machine Learning engineer, building, owning... 
    Work experience placement
    Remote work

    10a Labs

    New York, NY
    4 days ago
  •  ...Remote Machine Learning Engineer Jersey City, NJ, United States...  ...seeking an outstanding ML Engineer to join our...  ...machine learning solutions on our platform, powering...  .... - Manage MLOps infrastructure to monitor and...  ...Engineer or production-focused Data Scientist.... 
    Remote work

    Angenex

    Jersey City, NJ
    3 days ago
  • $170k - $220k

     ...that support them. By focusing on people with...  ...train, and evaluate machine learning models and AI systems...  ...algorithms and automated solutions in partnership with...  ...Utilize ML infrastructure to serve model inferences...  ...and learning across engineering, product, and design... 
    Full time
    Work at office
    Local area

    Charlie Health Outreach

    New York, NY
    1 day ago
  • $148.7k - $229.9k

     ...Experimentation Platform team is looking for a senior machine learning engineer to lead the evolution of how we validate...  ..., building the tools that allow ML teams to iterate faster and with higher confidence. You will join a team focused on high-velocity innovation, moving... 
    Temporary work
    Work at office
    Worldwide
    Relocation package

    Unity

    New York, NY
    1 day ago
  • $148.7k - $199.4k

     ...Senior Machine Learning Engineer - News Technology is at the heart...  .... The News ML team is responsible...  ...continuously advancing our ML infrastructure and recommendation...  ...while ensuring that solutions are scalable,...  ...a fast-paced, guest-focused environment The... 
    Work experience placement
    Local area
    Day shift

    Disney

    New York, NY
    1 day ago
  •  ...Senior Machine Learning Engineer - News Technology is at the heart...  .... The News ML team is responsible...  ...continuously advancing our ML infrastructure and recommendation...  ...while ensuring that solutions are scalable,...  ...in a fast-paced, guest-focused environment... 
    Work experience placement
    Local area
    Day shift

    Walt Disney Company

    New York, NY
    1 day ago
  •  ...Machine Learning Engineer - Inference / Serving Join to apply for the Machine...  ...ethics. Today, we are focused on bringing the performance...  ...large‑scale data and model infrastructure, and help define how Behavioral...  .... This is an applied ML systems role—equal parts engineering... 
    Full time
    Remote work

    Yobi AI

    New York, NY
    4 days ago
  •  ...Senior Machine Learning Engineer Teleskope is redefining data security...  ...and scaling production ML pipelines for entity...  ...both analytics and ML-focused data scientists to translate...  ...ownership of critical infrastructure, and ship reliable solutions that deliver real-world... 
    Work at office
    Work from home
    Flexible hours

    Teleskope

    New York, NY
    2 days ago
  • $205k - $270k

     ...building state-of-the-art Machine Learning systems that power...  ...research with strong engineering discipline, we enable...  ...environments. A key focus of this role is...  ...forefront of applying modern ML and speech/NLP...  ...evaluation and benchmarking infrastructure, and enabling... 
    Work at office
    Remote work
    Home office
    Flexible hours

    Cresta

    New York, NY
    2 days ago
  •  ...currently seeking a talented Machine Learning Engineer with expertise in Databricks ML to join our client's...  ...machine learning solutions. Develop and deploy machine...  ...Scientist, with a focus on building and...  ...Consulting, and Data Infrastructure and Analytics #J-18808... 
    Full time
    Remote work

    ON Data Staffing

    New York, NY
    2 days ago
  • $70 - $80 per hour

     ...client is seeking a senior Machine Learning Engineer / Data Scientist on a contract...  ...execution-oriented role focused on building, running, validating...  ...platform and underlying infrastructure. Responsibilities Build,...  ...Terraform Familiarity with ML frameworks such as scikit-learn... 
    Permanent employment
    Full time
    Contract work
    Remote work

    AUSTIN WORKS

    New York, NY
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Solutions Engineer (ML + Infrastructure Focus). Be the first to apply!