Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff ML Performance Engineer — Scalable Inference & CUDA

Modal

A leading AI infrastructure company based in New York is seeking experienced engineers to enhance the performance of ML systems and contribute to open-source projects. Ideal candidates will have over 5 years of experience in writing high-quality code and familiarity with Nvidia GPU architecture and ML frameworks. This role offers opportunities for significant growth within a fast-growing team and requires in-person collaboration in NYC, San Francisco, or Stockholm. #J-18808-Ljbffr Modal

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff ML Performance Engineer — Scalable Inference & CUDA in New York, NY vacancy
  •  ...the Role As an ML Research Engineer at Maple, you'll be...  ...automated systems to monitor performance, detect anomalies,...  ...optimized production inference. Lead evaluations,...  ...robustness and scalability. Balance research...  ...optimization experience with CUDA/Triton preferred. ~... 
    Performance
    Work at office
    Local area

    Maple AI, Inc

    New York, NY
    4 days ago
  •  ...the first and founding ML Operations Engineer at Tennr, you’ll play...  ...training and inference pipelines that can handle...  ...is powered by robust, scalable, and efficiently deployed...  ...systems to enhance performance and efficiency....  ...inference) involving CUDA profiling, memory optimization... 
    Performance
    Work at office

    Tennr

    New York, NY
    5 days ago
  •  ...and deploy production‑grade ML systems with end‑to‑end...  ...model training, deployment, inference, and monitoring in production...  ...infrastructure and processes for scalability and performance. Qualifications Bachelor’s...  ...experience in ML engineering. Strong programming skills... 
    Performance
    Full time

    Catalyst Labs, LLC

    New York, NY
    4 days ago
  •  ...join their Technology team. The role involves designing high-performance infrastructure for generative AI and machine learning workloads...  ...should have a relevant degree and 3-7 years of experience in scalable systems. The position offers competitive compensation, health... 
    Performance

    Point72 Asset Management, L.P

    New York, NY
    1 day ago
  •  ...help healthcare professionals perform at their best. At Solventum,...  ....**Job Description:****ML Engineer****3M Health Care is now Solventum...  ...AI services are secure and scalable.**Key Responsibilities****1....  ...for model training and inference.* **Feature Management:** Help... 
    Performance
    H1b
    Remote work

    Solventum

    New York, NY
    1 day ago
  • $200k

     ...seeking a Machine Learning Performance Engineer to join our team, focusing on...  ...infrastructure, training, and inference challenges to advance our...  ...What you'll do: Build scalable and robust training and...  ...-level GPU programming with CUDA, including Tensor Cores, cooperative... 
    Performance
    Work at office

    Optiver

    New York, NY
    5 days ago
  •  ...Machine Learning / Software Engineer Dyania Health is a...  ...mission. As a senior ML engineer at Dyania,...  ..., build, and deploy scalable ML-driven systems that...  ...optimization, deployment, and inference at scale. Architect...  ...model and system performance; communicate findings... 
    Performance
    Internship
    Local area
    Remote work
    Flexible hours
    Shift work

    HealthX Ventures

    Jersey City, NJ
    3 days ago
  • Tubi Tv is seeking a Software Engineer specializing in ML Infra & Distributed Systems to enhance their...  ...and ML teams, you will design high-performance, low-latency systems that power...  ...Ideal candidates have experience in scalable system design and an enthusiasm for... 
    Performance

    Tubi Tv

    New York, NY
    3 days ago
  • $200k

     ...Machine Learning Research Engineer to join our team,...  ...infrastructure, training, and inference challenges to advance...  ...Build scalable and robust training and...  ...in a supportive, high-performing environment alongside...  ...or other accelerators (CUDA, Triton, Pallas, etc.)... 
    Performance
    Work at office

    Optiver

    New York, NY
    5 days ago
  • The Consensus is looking for a Software Engineer focused on ML performance to join our team in New York. This role involves working with cutting-edge AI technologies and optimizing ML models, particularly large language models (LLMs). Ideal candidates will possess strong... 
    Performance
    Flexible hours

    The Consensus

    New York, NY
    5 days ago
  • $200k - $250k

     ...we’re building the top-performing AI Shopping Agent that...  ..., and trust. Our ML models power the core...  ...experienced Senior MLOps Engineer to take ownership of how...  ...- for a custom-built inference platform powering a live...  ..., cost-efficient, and scalable, partnering with... 
    Performance
    Remote work
    Flexible hours

    Wizard

    New York, NY
    3 days ago
  • $200k - $265k

     ...Senior Machine Learning Engineer on the AI Image...  ...machine learning and scalable ML infrastructure will be...  ...responsiveness to prompting, inference time, and...  ...experiments to benchmark model performance, tracking quality metrics...  ...ComfyUI, TensorRT, and CUDA. Experience building... 
    Performance
    Work at office

    Cantina

    New York, NY
    2 days ago
  •  ...platform helps contractors, engineering firms, and utilities...  ...of our training and inference pipelines, fortifying...  ...reliable, high-performing, and secure actionable...  ...: Design and maintain scalable architectures for serving...  ...packaging and scaling ML applications. Infrastructure... 
    Performance
    For contractors

    SewerAI Corporation

    New York, NY
    3 days ago
  • Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer - Inference...  ...Today, we are focused on bringing the performance of closed‑web user acquisition to the...  ...and CTV products. This is an applied ML systems role—equal parts engineering... 
    Performance
    Full time
    Remote work

    Yobi AI

    New York, NY
    1 day ago
  • $175k - $280k

     ...layer, integrating LLM, speech, and vision models. The ideal candidate has significant experience in systems programming and performance engineering, aiming to improve high-throughput, low-latency serving. Join a team dedicated to pioneering advancements in voice agents... 
    Performance

    Sesame

    New York, NY
    8 days ago
  •  ...AI/ML Engineer We are seeking a highly skilled Senior Developer...  ...engineering expertise in building scalable data systems and good...  ...and consistency. Ensure performance and stability of LLM-based components...  ...LLMOps tools and scalable inference strategies. Prior work... 
    Performance
    Local area

    RIT Solutions

    New York, NY
    2 days ago
  • $110k - $130k

     ...: Machine Learning (ML) at the New York Times...  ...York Times real-time ML inference models, including both...  ...end, our partners are engineering systems that call...  ...deploying ML models as scalable, low-latency, and highly...  ...data drift, and model performance degradation. *... 
    Performance
    Full time
    Local area
    Flexible hours

    The New York Times

    New York, NY
    4 hours ago
  • $160k - $200k

     ...layer that can accurately and scalably synthesize information from...  ...We’re hiring an exceptional ML Engineer to join our team (Boston or...  ...efficient, secure, reliable, and performant ML pipelines and...  ...systems (design, training, inference, deployment, and monitoring;... 
    Performance
    Work at office

    Verana Health

    New York, NY
    5 days ago
  • $200.2k - $357.5k

     ...operations. We’re hiring a Staff / Senior Staff...  ...Infrastructure Engineer to lead the design...  ...of our end-to-end ML platform powering...  ...batch and online inference, and edge deployment...  ...and operate scalable online and batch inference...  ...tied to performance, subject to plan terms... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Samsara

    New York, NY
    3 days ago
  • $170k - $190k

     ...interruption handling, streaming inference, and audio quality, and...  ...translate these into scalable, enterprise-grade...  ...production Improve model performance and inference workflows...  ...the team, mentoring engineers and promoting best practices in ML engineering Partner with... 
    Performance
    Remote work

    ASAPP

    New York, NY
    2 days ago
  •  ...needs. Collaborate with data scientists and software engineers to design and implement scalable and efficient solutions. Clean, preprocess, and analyze...  ...into production environments and monitor their performance. Continuously improve model accuracy and performance... 
    Performance

    Resolve Tech Solutions

    New York, NY
    2 days ago
  • $160k - $230k

     ...Core Linux · Low Latency · Network Engineering AI/ML Solutions Architect – Distributed Training...  ...training, multi-GPU systems, and scalable AI inference infrastructure. You'll work directly...  ..., you'll: Design and deploy high-performance ML pipelines across hundreds/thousands... 
    Performance
    Full time
    Remote work

    Doghouse Recruitment

    New York, NY
    3 days ago
  •  ...Machine Learning Engineer ExaCare Inc – New York, New...  ...processes that enable ML to move from research...  ...turn their work into scalable, maintainable, and cost...  ...support model training and inference Build tooling and...  ...for monitoring model performance , system reliability,... 
    Performance
    Flexible hours

    ExaCare Inc

    New York, NY
    1 day ago
  • $200k - $300k

     ...Hiring: Machine Learning Engineer II (Autonomous...  ...mission by developing scalable, production-grade models...  ...to building end-to-end ML systems for large-scale...  ...teams to ensure model performance in simulation and on-vehicle...  ...The TalentHaus by 2x Inferred from the description... 
    Performance
    Full time
    Immediate start
    Remote work

    The TalentHaus

    New York, NY
    3 days ago
  • $153k - $198k

     ...Senior Machine Learning Engineer, you will own the end to end ML lifecycle at Button, from...  ...for latency, scalability, cost efficiency, reproducibility...  ...workflows, model deployment, inference services, monitoring,...  ...services with clear performance, reliability, and latency... 
    Performance
    Local area

    Button

    New York, NY
    3 days ago
  • $210k - $250k

     ...layer that can accurately and scalably synthesize information from...  ...We’re hiring an exceptional ML Engineer to join our team (Boston or...  ...models (methods to detect drift/performance degradation; develop...  ...systems (design, training, inference, deployment, and monitoring;... 
    Performance
    Work at office

    Verana Health

    New York, NY
    5 days ago
  • $150k - $215k

     ...team combining world‑class engineers with veteran strategists who...  ...augmentation at scale. Our ML team builds the services and...  ...tuning models to deploying high‑performance inference services, and we operate...  ...driving the development of scalable ML services for enrichment.... 
    Performance
    Permanent employment
    Contract work
    For contractors
    For subcontractor
    Work at office
    Remote work

    Vannevar Labs

    New York, NY
    3 days ago
  •  ...We are looking for an engineer with experience in low-level...  ...to join our growing ML team. Machine learning...  ...here is optimising the performance of our models - both training and inference. We care about efficient...  ...straightforward CUDA, but the interesting part... 
    Performance

    Jane Street

    New York, NY
    4 days ago
  •  ...Windmill is building the future of performance. Windmill is the first context graph...  ...Deployment : Design, build, and deploy scalable machine learning models to enhance product...  ...closely with data scientists, software engineers, and founders to integrate machine... 
    Performance
    Work at office
    Relocation

    WindMill

    New York, NY
    5 days ago
  •  ...Senior Machine Learning Engineer Disney...  ...distributed data and ML infrastructure that supports...  ...adjacent services such as inference inputs, feature APIs,...  ...layers. Contribute to scalable service patterns including...  ...system availability, performance, and cost efficiency.... 
    Performance
    Worldwide

    Walt Disney Company

    New York, NY
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Performance Engineer — Scalable Inference & CUDA. Be the first to apply!