Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff ML Performance Engineer — Scalable Inference & CUDA

Modal

A leading AI infrastructure company based in New York is seeking experienced engineers to enhance the performance of ML systems and contribute to open-source projects. Ideal candidates will have over 5 years of experience in writing high-quality code and familiarity with Nvidia GPU architecture and ML frameworks. This role offers opportunities for significant growth within a fast-growing team and requires in-person collaboration in NYC, San Francisco, or Stockholm. #J-18808-Ljbffr Modal

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Staff ML Performance Engineer — Scalable Inference & CUDA in New York, NY vacancy
  •  ...jobs, and serve low-latency inference. We have thousands of...  ...medalists, and experienced engineering and product leaders with decades...  ...with experience in making ML systems performant at scale. If you are interested...  ...GPU architecture and CUDA. Experience with ML performance... 
    Performance

    Modal Labs

    New York, NY
    5 days ago
  • A leading Behavioral AI company is seeking a Machine Learning Engineer focused on inference and serving. In this role, you will design and optimize systems to operationalize AI models. The ideal candidate has deep expertise in model deployment, a strong low-latency mindset... 
    Suggested
    Remote work

    Yobi AI

    New York, NY
    7 days ago
  • $128.7k - $261.3k

     ...Team The Model Deployment & Inference Solutions team in GM AV...  ...mission is two-fold: build the ML deployment platform that...  ...automating workflows currently performed manually by engineers. Build the developer...  ...at the integration level (CUDA-aware Python, TensorRT, Triton... 
    Performance
    Flexible hours
    Shift work

    General Motors

    New York, NY
    7 hours ago
  •  ...Reddit, Inc. is seeking a Staff Machine Learning Engineer to lead the development of a large-scale ML Inference Platform. Responsibilities include designing cloud-based ML...  ...Kubernetes and ensuring reliable, low-latency performance. Candidates should have 7+ years of... 
    Performance

    Reddit

    New York, NY
    7 hours ago
  •  ...Oefentherapie is seeking a skilled Machine Learning Engineer with expertise in developing scalable and efficient ML systems. The position emphasizes collaboration...  ...background in designing and architecting high-performance models. This role offers competitive salary and... 
    Performance

    Ll Oefentherapie

    New York, NY
    2 days ago
  •  ...A leading cloud technology company in the United States seeks an ML Performance Engineer Principal Lead to optimize inference performance across its platforms. The role involves evaluating techniques like quantization and hardware-aware scheduling. Ideal candidates will... 
    Performance

    Akamai

    New York, NY
    7 hours ago
  •  ...About the Role As an ML Research Engineer at Maple, you'll be a...  ...automated systems to monitor performance, detect anomalies,...  ...optimized production inference. Lead evaluations,...  ...robustness and scalability. Balance research innovation...  ...experience with CUDA/Triton preferred. Proven... 
    Performance
    Work at office
    Local area

    Maple

    New York, NY
    5 days ago
  •  ...join their Technology team. The role involves designing high-performance infrastructure for generative AI and machine learning workloads...  ...should have a relevant degree and 3-7 years of experience in scalable systems. The position offers competitive compensation, health... 
    Performance

    Point72 Asset Management, L.P

    New York, NY
    3 days ago
  • $128.7k - $261.3k

     ...seeks a skilled professional to develop its ML deployment platform within the...  ...deployment from training to on-vehicle inference and enhancing developer experience through...  ...from $128,700 to $261,300 with additional performance bonuses and a comprehensive benefits package... 
    Performance

    General Motors

    New York, NY
    7 hours ago
  •  ...Ocient is seeking a Senior Software Engineer - Machine Learning & Geospatial to enhance its ML capabilities. This fully remote position focuses on identifying...  ...to popular ML frameworks, ensuring efficient performance at scale. The ideal candidate has over 5 years of... 
    Performance
    Remote work

    Ocient

    New York, NY
    5 days ago
  •  ...Summary: We are seeking an ML Engineer to bridge the gap between...  ...role focuses on building scalable, reliable systems to serve machine...  ...environments, ensuring high performance and operational efficiency....  ...models Optimize model inference for speed, scalability, and... 
    Performance

    Compunnel

    Jersey City, NJ
    4 days ago
  • $200k - $250k

     ...we’re building the top-performing AI Shopping Agent that...  ..., and trust. Our ML models power the core...  ...experienced Senior MLOps Engineer to take ownership of how...  ...– for a custom-built inference platform powering a live...  ..., cost-efficient, and scalable, partnering with... 
    Performance
    Remote work
    Flexible hours

    Wizard

    New York, NY
    7 hours ago
  • $128.7k - $261.3k

     ...export, kernel development, and performance engineering so that every cycle on our...  ...models into fast, reliable inference across GPUs powering GM’s...  ...and diving into MLIR/ONNX and CUDA/TensorRT internals. We value...  ...reliable, and effortless for ML engineers across the AV... 
    Performance
    Flexible hours

    General Motors

    New York, NY
    7 hours ago
  •  ...platform helps contractors, engineering firms, and utilities...  ...of our training and inference pipelines, fortifying...  ...reliable, high-performing, and secure actionable...  ...: Design and maintain scalable architectures for serving...  ...packaging and scaling ML applications. Infrastructure... 
    Performance
    For contractors

    SewerAI Corporation

    New York, NY
    7 hours ago
  •  ...help healthcare professionals perform at their best. At Solventum,...  ....**Job Description:****ML Engineer****3M Health Care is now Solventum...  ...AI services are secure and scalable.**Key Responsibilities****1....  ...for model training and inference.* **Feature Management:** Help... 
    Performance
    H1b
    Remote work

    Solventum

    New York, NY
    3 days ago
  • $200k

     ...seeking a Machine Learning Performance Engineer to join our team, focusing on...  ...infrastructure, training, and inference challenges to advance our...  ...What you'll do: Build scalable and robust training and...  ...-level GPU programming with CUDA, including Tensor Cores, cooperative... 
    Performance
    Work at office

    Optiver

    New York, NY
    1 day ago
  • $128.7k - $261.3k

     ...model export, kernel development, and performance engineering so that every cycle on our accelerators...  ...sit at the heart of our on‑vehicle ML inference for ADAS and autonomous driving. We own...  ...implement, benchmark, and iterate on CUDA‑based kernels and custom operators to... 
    Performance
    Flexible hours

    General Motors

    New York, NY
    7 hours ago
  •  ...ML Engineer Stamford, Connecticut, United States About the Job Our client...  ...preprocessing, model training, deployment, inference, and monitoring in production...  ...ML infrastructure and processes for scalability and performance. Qualifications ~ Bachelor... 
    Performance
    Full time

    Catalyst Labs, LLC

    New York, NY
    2 days ago
  • $200k - $265k

     ...Senior Machine Learning Engineer on the AI Image...  ...machine learning and scalable ML infrastructure will be...  ...responsiveness to prompting, inference time, and...  ...experiments to benchmark model performance, tracking quality metrics...  ...ComfyUI, TensorRT, and CUDA. Experience building... 
    Performance
    Work at office

    Cantina Labs

    New York, NY
    7 days ago
  •  ...Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer...  ...Today, we are focused on bringing the performance of closed‑web user acquisition to the open...  ...CTV products. This is an applied ML systems role—equal parts engineering depth... 
    Performance
    Full time
    Remote work

    Yobi AI

    New York, NY
    7 days ago
  • $110k - $130k

     ...Machine Learning (ML) at the New York Times...  ...York Times real-time ML inference models, including both...  ...end, our partners are engineering systems that call these...  ...deploying ML models as scalable, low-latency, and highly...  ...data drift, and model performance degradation.... 
    Performance
    Local area
    Flexible hours

    New York Times

    New York, NY
    2 days ago
  • A Behavioral AI company is seeking a Machine Learning Engineer to design and optimize systems for bringing their models to life. The role involves ensuring ML models are efficient and reliable, requiring experience in model deployment and robust coding skills. Candidates... 
    Remote work

    Yobi

    New York, NY
    2 days ago
  •  ...help healthcare professionals perform at their best. At Solventum,...  ...Description:****Principal ML Engineer****3M Health Care is now Solventum...  ...performance, and automated scalability** over hype.While many focus...  ...experimentation.* **Inference at Scale:** Architect high-performance... 
    Performance
    H1b
    Remote work

    Solventum

    New York, NY
    3 days ago
  • $192.4k - $357.4k

     ...an experienced Head of Engineering, reporting to the VP...  ...technical expertise in AI/ML engineering, large-...  ...maintaining a robust, scalable, and secure ML platforms...  ..., deployment, and inference for frontier research....  ...fine-tuning, and high-performance inference.) Preferred... 
    Performance
    Local area
    Worldwide
    Relocation package

    Disability Solutions

    New York, NY
    1 day ago
  •  ...Ipro Networks Pte. Ltd. is seeking a Principal AI/ML Engineer to lead the development of AI infrastructure and support fraud...  ...development. You will guide engineering teams, ensuring scalability and performance in a dynamic setting, while representing the company externally... 
    Performance
    Remote work

    Ipro Networks Pte. Ltd.

    New York, NY
    7 hours ago
  • $200k

     ...Machine Learning Research Engineer to join our team,...  ...infrastructure, training, and inference challenges to advance...  ...Build scalable and robust training and...  ...in a supportive, high-performing environment alongside...  ...or other accelerators (CUDA, Triton, Pallas, etc.)... 
    Performance
    Work at office

    Optiver

    New York, NY
    1 day ago
  • $200.2k - $357.5k

     ...operations. We’re hiring a Staff / Senior Staff...  ...Infrastructure Engineer to lead the design...  ...of our end‑to‑end ML platform powering...  ...driving scalable innovation for customers...  ...experimentation, batch/online inference, edge) used by...  ...tied to performance, subject to plan terms... 
    Performance
    Full time
    Work at office
    Remote work
    Flexible hours

    Samsara

    New York, NY
    7 hours ago
  •  ...looking to bring on an experienced Data Engineer to join Eight Sleep’s machine learning team...  ...This role involves monitoring production ML systems, building tools and pipelines for...  ...machine learning systems to ensure performance and reliability Develop big data ETL pipelines... 
    Performance
    Full time
    Sleeping nights
    Flexible hours
    Night shift

    Eight Sleep

    New York, NY
    5 days ago
  • $300k

     ...Staff + Sr. Software Engineer, Inference San Francisco, CA | New York City, NY | Seattle, WA About Anthropic Anthropic's mission is to create...  ...research by giving our scientists the high-performance inference infrastructure they need to develop next-generation... 
    Performance
    Work at office
    Worldwide
    Visa sponsorship
    Flexible hours

    anthropic

    New York, NY
    4 days ago
  •  ...Description As the first ML Ops Engineer at Tennr, you'll play...  ...training and inference pipelines that can...  ...is powered by robust, scalable, and efficiently deployed...  ...systems to enhance performance and efficiency....  ...inference) involving CUDA profiling, memory optimization... 
    Performance
    Work at office

    Tennr

    New York, NY
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Performance Engineer — Scalable Inference & CUDA. Be the first to apply!