Staff ML Performance Engineer Scalable Inference & CUDA

Modal

A leading AI infrastructure company based in New York is seeking experienced engineers to enhance the performance of ML systems and contribute to open-source projects. Ideal candidates will have over 5 years of experience in writing high-quality code and familiarity with Nvidia GPU architecture and ML frameworks. This role offers opportunities for significant growth within a fast-growing team and requires in-person collaboration in NYC, San Francisco, or Stockholm. #J-18808-Ljbffr

Apply

Vacancy posted 6 hours ago

Similar jobs that could be interesting for youBased on the Staff ML Performance Engineer Scalable Inference & CUDA in New York, NY vacancy

Senior ML Performance Engineer - GPU & Inference
...jobs, and serve low-latency inference. We have thousands of... ...medalists, and experienced engineering and product leaders with decades... ...with experience in making ML systems performant at scale. If you are interested... ...GPU architecture and CUDA. Experience with ML performance...
Performance
Modal Labs
New York, NY
4 days ago
Senior ML Engineer: Scalable AI Platform (Remote, Equity)
$140k - $180k
...Artera is seeking a Machine Learning Engineer to develop scalable pipelines for model training and evaluation, collaborate with AI teams, and optimize model performance. The ideal candidate will have over 5 years of software engineering experience and strong expertise...
Performance
Remote work
Artera Corporation
New York, NY
2 days ago
Senior ML Inference Engineer - Platform
$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV... ...mission is two-fold: build the ML deployment platform that... ...automating workflows currently performed manually by engineers. Build the developer... ...at the integration level (CUDA-aware Python, TensorRT, Triton...
Performance
Flexible hours
Shift work
General Motors
New York, NY
3 days ago
ML Infrastructure Engineer
...the first and founding ML Operations Engineer at Tennr, you’ll play... ...training and inference pipelines that can handle... ...is powered by robust, scalable, and efficiently deployed... ...systems to enhance performance and efficiency.... ...inference) involving CUDA profiling, memory optimization...
Performance
Work at office
Tennr
New York, NY
1 day ago
GenAI ML Infra Engineer Scalable AI Systems
...join their Technology team. The role involves designing high-performance infrastructure for generative AI and machine learning workloads... ...should have a relevant degree and 3-7 years of experience in scalable systems. The position offers competitive compensation, health...
Performance
Point72 Asset Management, L.P
New York, NY
6 hours ago
ML Research Engineer
...About the Role As an ML Research Engineer at Maple, you'll be a... ...automated systems to monitor performance, detect anomalies,... ...optimized production inference. Lead evaluations,... ...robustness and scalability. Balance research innovation... ...experience with CUDA/Triton preferred. Proven...
Performance
Work at office
Local area
Maple
New York, NY
3 days ago
Senior ML Inference Platform Engineer - Real-Time AV
$128.7k - $261.3k
...seeks a skilled professional to develop its ML deployment platform within the... ...deployment from training to on-vehicle inference and enhancing developer experience through... ...from $128,700 to $261,300 with additional performance bonuses and a comprehensive benefits package...
Performance
General Motors
New York, NY
3 days ago
Senior Machine Learning Engineer (Inference Platform)
$200k - $250k
...we’re building the top-performing AI Shopping Agent that... ..., and trust. Our ML models power the core... ...experienced Senior MLOps Engineer to take ownership of how... ...– for a custom-built inference platform powering a live... ..., cost-efficient, and scalable, partnering with...
Performance
Remote work
Flexible hours
Wizard
New York, NY
2 days ago
Machine Learning Performance Engineer
...seeking a Machine Learning Performance Engineer to join our team, focusing on... ...infrastructure, training, and inference challenges to advance our... ...strategies. Responsibilities Build scalable and robust training and... ...-level GPU programming with CUDA, including Tensor Cores,...
Performance
Work at office
Optiver Holding BV
New York, NY
1 day ago
ML Engineer, Generative Video
...the Role Mirage is seeking an ML Engineer to build and scale the... ...objectives, scaling strategies, and inference optimization and efficiency... ...Monitor and improve model performance in real‑world usage What... ...infrastructure Expertise in PyTorch, CUDA, Triton, and distributed...
Performance
Full time
Local area
Night shift
Mirage
New York, NY
1 day ago
Senior ML Accelerator Engineer - GPU
$128.7k - $261.3k
...model export, kernel development, and performance engineering so that every cycle on our accelerators... ...sit at the heart of our on‑vehicle ML inference for ADAS and autonomous driving. We own... ...implement, benchmark, and iterate on CUDA‑based kernels and custom operators to...
Performance
Flexible hours
General Motors
New York, NY
3 days ago
Machine Learning Engineer, Images
$200k - $265k
...Senior Machine Learning Engineer on the AI Image... ...machine learning and scalable ML infrastructure will be... ...responsiveness to prompting, inference time, and... ...experiments to benchmark model performance, tracking quality metrics... ...ComfyUI, TensorRT, and CUDA. ~ Experience...
Performance
Work at office
Cantina
New York, NY
2 days ago
ML Engineer
...Summary: We are seeking an ML Engineer to bridge the gap between... ...role focuses on building scalable, reliable systems to serve machine... ...environments, ensuring high performance and operational efficiency.... ...models Optimize model inference for speed, scalability, and...
Performance
Compunnel
Jersey City, NJ
2 days ago
ML Model Serving Engineer - High-Performance Inference
$175k - $280k
...layer, integrating LLM, speech, and vision models. The ideal candidate has significant experience in systems programming and performance engineering, aiming to improve high-throughput, low-latency serving. Join a team dedicated to pioneering advancements in voice agents...
Performance
SESAME
New York, NY
6 hours ago
Machine Learning Engineer - Inference / Serving
...Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer... ...Today, we are focused on bringing the performance of closed‑web user acquisition to the open... ...CTV products. This is an applied ML systems role—equal parts engineering depth...
Performance
Full time
Remote work
Yobi AI
New York, NY
6 hours ago
ML Engineer
...ML Engineer Georgia, Georgia, United States About the Job Our client is a... ...preprocessing, model training, deployment, inference, and monitoring in production... ...ML infrastructure and processes for scalability and performance. Qualifications ~ Bachelor...
Performance
Full time
Catalyst Labs, LLC
New York, NY
5 days ago
ML Ops Engineer, Machine Learning & AI
$110k - $130k
...Machine Learning (ML) at the New York Times... ...York Times real-time ML inference models, including both... ...end, our partners are engineering systems that call these... ...deploying ML models as scalable, low-latency, and highly... ...data drift, and model performance degradation....
Performance
Local area
Flexible hours
New York Times
New York, NY
5 days ago
Staff ML Compiler Engineer
$185.1k - $335.3k
...kernel development, and performance engineering so that every cycle on... ...into fast, reliable inference across GPUs powering GM... ...into MLIR/ONNX and CUDA/TensorRT internals, and... ...driving. The Role As a Staff Compiler Engineer on the... ..., and effortless for ML engineers across the...
Performance
Flexible hours
General Motors
New York, NY
3 days ago
ML Ops Engineer (AI)
...platform helps contractors, engineering firms, and utilities... ...of our training and inference pipelines, fortifying... ...reliable, high-performing, and secure actionable... ...: Design and maintain scalable architectures for serving... ...packaging and scaling ML applications. Infrastructure...
Performance
For contractors
SewerAI Corporation
New York, NY
3 days ago
Machine Learning Research Engineer
...Machine Learning Research Engineer to join our team,... ...infrastructure, training, and inference challenges to advance... ...forecasting Build scalable and robust training... ...in a supportive, high-performing environment alongside... ...or other accelerators (CUDA, Triton, Pallas, etc.)...
Performance
Work at office
Optiver Holding BV
New York, NY
1 day ago
Staff ML Engineer - ML Infrastructure
$200.2k - $357.5k
...operations. We’re hiring a Staff / Senior Staff... ...Infrastructure Engineer to lead the design... ...of our end‑to‑end ML platform powering... ...driving scalable innovation for customers... ...experimentation, batch/online inference, edge) used by... ...tied to performance, subject to plan terms...
Performance
Full time
Work at office
Remote work
Flexible hours
Samsara
New York, NY
3 days ago
ML Data Engineer — Build Scalable ML Pipelines
...looking to bring on an experienced Data Engineer to join Eight Sleep’s machine learning team... ...This role involves monitoring production ML systems, building tools and pipelines for... ...machine learning systems to ensure performance and reliability Develop big data ETL pipelines...
Performance
Full time
Sleeping nights
Flexible hours
Night shift
Eight Sleep
New York, NY
3 days ago
ML Research Engineer
$160k - $200k
...layer that can accurately and scalably synthesize information from... ...We’re hiring an exceptional ML Engineer to join our team (Boston or... ...efficient, secure, reliable, and performant ML pipelines and... ...systems (design, training, inference, deployment, and monitoring;...
Performance
Work at office
Verana Health
New York, NY
5 days ago
Principal ML Engineer (Applied/Systems)
...highly experienced Principal ML Engineer (Applied / Systems) to join... ...turning them into robust, scalable production systems that serve... ...all with a strong focus on performance, maintainability, and impact... ...Build, optimize, and scale inference pipelines and model serving...
Performance
Soris
New York, NY
5 days ago
Senior ML Engineer (ML Infrastructure & Data Systems) @ Early-stage Robotics Startup
$175k - $250k
...Senior Machine Learning Engineer (ML Infrastructure & Data Systems... ...continuously improving system performance through tight feedback... ...performance Ensure reliability, scalability, and high availability of... ...and scaled ML training and inference systems in production...
Performance
Right Hand Talent
Brooklyn, NY
2 days ago
ML Research Engineer, ML Systems
...ML Research Engineer, ML Systems Scale's ML platform (RLXF) team builds... ...language model training and inference. The platform has been powering... ...and tools such as CUDA, Pytorch, transformers, flash... ...qualifications, interview performance, and relevant education or...
Performance
Scale AI
New York, NY
2 days ago
AI/ML Engineer
...AI/ML Engineer – Develop LLM-powered solutions to enhance RAG, entity... ...with LLMs, monitoring performance through telemetry, and continuously... ...requirements into robust, scalable AI/ML solutions. This is a hands... ...telemetry from AI Gateway Inference Tables and custom logs,...
Performance
Worldwide
Lakefusion
New York, NY
2 days ago
AI/ML Engineer Intern
...What you will do As our ML Engineer Intern, you'll be the... ...product ecosystem Designing scalable ML infrastructure and pipelines... ...media datasets Implementing inference systems for content... ...systems Optimizing model performance for cost efficiency while maintaining...
Performance
Contract work
Internship
Immediate start
Remote work
Worldwide
Melotech
New York, NY
2 days ago
Senior AI/ML Engineer, Global Banking & Markets, Front Office Technology
$150k - $300k
...We Do At Goldman Sachs, our Engineers don't just make things - we... ...teams that build massively scalable software and systems,... ...microservices. Scalability & Performance: Optimize inference latency and manage token costs... ...least 3 years focused on AI/ML integration in production....
Performance
Immediate start
The Goldman Sachs Group
New York, NY
6 hours ago
AI/ML Engineer
...contractual opportunity for an AI / ML Engineer located at the client site in... ...models, integrating them into scalable systems, and ensuring their reliability and performance through robust ML Ops... ...model serving: NVIDIA Triton Inference Server; vLLM; or Ollama. Knowledge...
Performance
Work at office
Trigyn Technologies
New York, NY
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Performance Engineer Scalable Inference & CUDA. Be the first to apply!