Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff ML Performance Engineer Scalable Inference & CUDA

Modal

A leading AI infrastructure company based in New York is seeking experienced engineers to enhance the performance of ML systems and contribute to open-source projects. Ideal candidates will have over 5 years of experience in writing high-quality code and familiarity with Nvidia GPU architecture and ML frameworks. This role offers opportunities for significant growth within a fast-growing team and requires in-person collaboration in NYC, San Francisco, or Stockholm. #J-18808-Ljbffr

Vacancy posted 6 hours ago
Similar jobs that could be interesting for youBased on the Staff ML Performance Engineer Scalable Inference & CUDA in New York, NY vacancy
  •  ...jobs, and serve low-latency inference. We have thousands of...  ...medalists, and experienced engineering and product leaders with decades...  ...with experience in making ML systems performant at scale. If you are interested...  ...GPU architecture and CUDA. Experience with ML performance... 
    Performance

    Modal Labs

    New York, NY
    4 days ago
  • $140k - $180k

     ...Artera is seeking a Machine Learning Engineer to develop scalable pipelines for model training and evaluation, collaborate with AI teams, and optimize model performance. The ideal candidate will have over 5 years of software engineering experience and strong expertise... 
    Performance
    Remote work

    Artera Corporation

    New York, NY
    2 days ago
  • $128.7k - $261.3k

     ...Team The Model Deployment & Inference Solutions team in GM AV...  ...mission is two-fold: build the ML deployment platform that...  ...automating workflows currently performed manually by engineers. Build the developer...  ...at the integration level (CUDA-aware Python, TensorRT, Triton... 
    Performance
    Flexible hours
    Shift work

    General Motors

    New York, NY
    3 days ago
  •  ...the first and founding ML Operations Engineer at Tennr, you’ll play...  ...training and inference pipelines that can handle...  ...is powered by robust, scalable, and efficiently deployed...  ...systems to enhance performance and efficiency....  ...inference) involving CUDA profiling, memory optimization... 
    Performance
    Work at office

    Tennr

    New York, NY
    1 day ago
  •  ...join their Technology team. The role involves designing high-performance infrastructure for generative AI and machine learning workloads...  ...should have a relevant degree and 3-7 years of experience in scalable systems. The position offers competitive compensation, health... 
    Performance

    Point72 Asset Management, L.P

    New York, NY
    6 hours ago
  •  ...About the Role As an ML Research Engineer at Maple, you'll be a...  ...automated systems to monitor performance, detect anomalies,...  ...optimized production inference. Lead evaluations,...  ...robustness and scalability. Balance research innovation...  ...experience with CUDA/Triton preferred. Proven... 
    Performance
    Work at office
    Local area

    Maple

    New York, NY
    3 days ago
  • $128.7k - $261.3k

     ...seeks a skilled professional to develop its ML deployment platform within the...  ...deployment from training to on-vehicle inference and enhancing developer experience through...  ...from $128,700 to $261,300 with additional performance bonuses and a comprehensive benefits package... 
    Performance

    General Motors

    New York, NY
    3 days ago
  • $200k - $250k

     ...we’re building the top-performing AI Shopping Agent that...  ..., and trust. Our ML models power the core...  ...experienced Senior MLOps Engineer to take ownership of how...  ...– for a custom-built inference platform powering a live...  ..., cost-efficient, and scalable, partnering with... 
    Performance
    Remote work
    Flexible hours

    Wizard

    New York, NY
    2 days ago
  •  ...seeking a Machine Learning Performance Engineer to join our team, focusing on...  ...infrastructure, training, and inference challenges to advance our...  ...strategies. Responsibilities Build scalable and robust training and...  ...-level GPU programming with CUDA, including Tensor Cores,... 
    Performance
    Work at office

    Optiver Holding BV

    New York, NY
    1 day ago
  •  ...the Role Mirage is seeking an ML Engineer to build and scale the...  ...objectives, scaling strategies, and inference optimization and efficiency...  ...Monitor and improve model performance in real‑world usage What...  ...infrastructure Expertise in PyTorch, CUDA, Triton, and distributed... 
    Performance
    Full time
    Local area
    Night shift

    Mirage

    New York, NY
    1 day ago
  • $128.7k - $261.3k

     ...model export, kernel development, and performance engineering so that every cycle on our accelerators...  ...sit at the heart of our on‑vehicle ML inference for ADAS and autonomous driving. We own...  ...implement, benchmark, and iterate on CUDA‑based kernels and custom operators to... 
    Performance
    Flexible hours

    General Motors

    New York, NY
    3 days ago
  • $200k - $265k

     ...Senior Machine Learning Engineer on the AI Image...  ...machine learning and scalable ML infrastructure will be...  ...responsiveness to prompting, inference time, and...  ...experiments to benchmark model performance, tracking quality metrics...  ...ComfyUI, TensorRT, and CUDA. ~ Experience... 
    Performance
    Work at office

    Cantina

    New York, NY
    2 days ago
  •  ...Summary: We are seeking an ML Engineer to bridge the gap between...  ...role focuses on building scalable, reliable systems to serve machine...  ...environments, ensuring high performance and operational efficiency....  ...models Optimize model inference for speed, scalability, and... 
    Performance

    Compunnel

    Jersey City, NJ
    2 days ago
  • $175k - $280k

     ...layer, integrating LLM, speech, and vision models. The ideal candidate has significant experience in systems programming and performance engineering, aiming to improve high-throughput, low-latency serving. Join a team dedicated to pioneering advancements in voice agents... 
    Performance

    SESAME

    New York, NY
    6 hours ago
  •  ...Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer...  ...Today, we are focused on bringing the performance of closed‑web user acquisition to the open...  ...CTV products. This is an applied ML systems role—equal parts engineering depth... 
    Performance
    Full time
    Remote work

    Yobi AI

    New York, NY
    6 hours ago
  •  ...ML Engineer Georgia, Georgia, United States About the Job Our client is a...  ...preprocessing, model training, deployment, inference, and monitoring in production...  ...ML infrastructure and processes for scalability and performance. Qualifications ~ Bachelor... 
    Performance
    Full time

    Catalyst Labs, LLC

    New York, NY
    5 days ago
  • $110k - $130k

     ...Machine Learning (ML) at the New York Times...  ...York Times real-time ML inference models, including both...  ...end, our partners are engineering systems that call these...  ...deploying ML models as scalable, low-latency, and highly...  ...data drift, and model performance degradation.... 
    Performance
    Local area
    Flexible hours

    New York Times

    New York, NY
    5 days ago
  • $185.1k - $335.3k

     ...kernel development, and performance engineering so that every cycle on...  ...into fast, reliable inference across GPUs powering GM...  ...into MLIR/ONNX and CUDA/TensorRT internals, and...  ...driving. The Role As a Staff Compiler Engineer on the...  ..., and effortless for ML engineers across the... 
    Performance
    Flexible hours

    General Motors

    New York, NY
    3 days ago
  •  ...platform helps contractors, engineering firms, and utilities...  ...of our training and inference pipelines, fortifying...  ...reliable, high-performing, and secure actionable...  ...: Design and maintain scalable architectures for serving...  ...packaging and scaling ML applications. Infrastructure... 
    Performance
    For contractors

    SewerAI Corporation

    New York, NY
    3 days ago
  •  ...Machine Learning Research Engineer to join our team,...  ...infrastructure, training, and inference challenges to advance...  ...forecasting Build scalable and robust training...  ...in a supportive, high-performing environment alongside...  ...or other accelerators (CUDA, Triton, Pallas, etc.)... 
    Performance
    Work at office

    Optiver Holding BV

    New York, NY
    1 day ago
  • $200.2k - $357.5k

     ...operations. We’re hiring a Staff / Senior Staff...  ...Infrastructure Engineer to lead the design...  ...of our end‑to‑end ML platform powering...  ...driving scalable innovation for customers...  ...experimentation, batch/online inference, edge) used by...  ...tied to performance, subject to plan terms... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Samsara

    New York, NY
    3 days ago
  •  ...looking to bring on an experienced Data Engineer to join Eight Sleep’s machine learning team...  ...This role involves monitoring production ML systems, building tools and pipelines for...  ...machine learning systems to ensure performance and reliability Develop big data ETL pipelines... 
    Performance
    Full time
    Sleeping nights
    Flexible hours
    Night shift

    Eight Sleep

    New York, NY
    3 days ago
  • $160k - $200k

     ...layer that can accurately and scalably synthesize information from...  ...We’re hiring an exceptional ML Engineer to join our team (Boston or...  ...efficient, secure, reliable, and performant ML pipelines and...  ...systems (design, training, inference, deployment, and monitoring;... 
    Performance
    Work at office

    Verana Health

    New York, NY
    5 days ago
  •  ...highly experienced Principal ML Engineer (Applied / Systems) to join...  ...turning them into robust, scalable production systems that serve...  ...all with a strong focus on performance, maintainability, and impact...  ...Build, optimize, and scale inference pipelines and model serving... 
    Performance

    Soris

    New York, NY
    5 days ago
  • $175k - $250k

     ...Senior Machine Learning Engineer (ML Infrastructure & Data Systems...  ...continuously improving system performance through tight feedback...  ...performance Ensure reliability, scalability, and high availability of...  ...and scaled ML training and inference systems in production... 
    Performance

    Right Hand Talent

    Brooklyn, NY
    2 days ago
  •  ...ML Research Engineer, ML Systems Scale's ML platform (RLXF) team builds...  ...language model training and inference. The platform has been powering...  ...and tools such as CUDA, Pytorch, transformers, flash...  ...qualifications, interview performance, and relevant education or... 
    Performance

    Scale AI

    New York, NY
    2 days ago
  •  ...AI/ML Engineer – Develop LLM-powered solutions to enhance RAG, entity...  ...with LLMs, monitoring performance through telemetry, and continuously...  ...requirements into robust, scalable AI/ML solutions. This is a hands...  ...telemetry from AI Gateway Inference Tables and custom logs,... 
    Performance
    Worldwide

    Lakefusion

    New York, NY
    2 days ago
  •  ...What you will do As our ML Engineer Intern, you'll be the...  ...product ecosystem Designing scalable ML infrastructure and pipelines...  ...media datasets Implementing inference systems for content...  ...systems Optimizing model performance for cost efficiency while maintaining... 
    Performance
    Contract work
    Internship
    Immediate start
    Remote work
    Worldwide

    Melotech

    New York, NY
    2 days ago
  • $150k - $300k

     ...We Do At Goldman Sachs, our Engineers don't just make things - we...  ...teams that build massively scalable software and systems,...  ...microservices. Scalability & Performance: Optimize inference latency and manage token costs...  ...least 3 years focused on AI/ML integration in production.... 
    Performance
    Immediate start

    The Goldman Sachs Group

    New York, NY
    6 hours ago
  •  ...contractual opportunity for an AI / ML Engineer located at the client site in...  ...models, integrating them into scalable systems, and ensuring their reliability and performance through robust ML Ops...  ...model serving: NVIDIA Triton Inference Server; vLLM; or Ollama. Knowledge... 
    Performance
    Work at office

    Trigyn Technologies

    New York, NY
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Performance Engineer Scalable Inference & CUDA. Be the first to apply!