Staff ML Performance Engineer — Scalable Inference & CUDA

Modal

A leading AI infrastructure company based in New York is seeking experienced engineers to enhance the performance of ML systems and contribute to open-source projects. Ideal candidates will have over 5 years of experience in writing high-quality code and familiarity with Nvidia GPU architecture and ML frameworks. This role offers opportunities for significant growth within a fast-growing team and requires in-person collaboration in NYC, San Francisco, or Stockholm. #J-18808-Ljbffr Modal

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff ML Performance Engineer — Scalable Inference & CUDA in New York, NY vacancy

ML Research Engineer
...the Role As an ML Research Engineer at Maple, you'll be... ...automated systems to monitor performance, detect anomalies,... ...optimized production inference. Lead evaluations,... ...robustness and scalability. Balance research... ...optimization experience with CUDA/Triton preferred. ~...
Performance
Work at office
Local area
Maple AI, Inc
New York, NY
4 days ago
ML Infrastructure Engineer
...the first and founding ML Operations Engineer at Tennr, you’ll play... ...training and inference pipelines that can handle... ...is powered by robust, scalable, and efficiently deployed... ...systems to enhance performance and efficiency.... ...inference) involving CUDA profiling, memory optimization...
Performance
Work at office
Tennr
New York, NY
5 days ago
ML Engineer
...and deploy production‑grade ML systems with end‑to‑end... ...model training, deployment, inference, and monitoring in production... ...infrastructure and processes for scalability and performance. Qualifications Bachelor’s... ...experience in ML engineering. Strong programming skills...
Performance
Full time
Catalyst Labs, LLC
New York, NY
4 days ago
GenAI ML Infra Engineer — Scalable AI Systems
...join their Technology team. The role involves designing high-performance infrastructure for generative AI and machine learning workloads... ...should have a relevant degree and 3-7 years of experience in scalable systems. The position offers competitive compensation, health...
Performance
Point72 Asset Management, L.P
New York, NY
1 day ago
ML Engineer
...help healthcare professionals perform at their best. At Solventum,... ....**Job Description:****ML Engineer****3M Health Care is now Solventum... ...AI services are secure and scalable.**Key Responsibilities****1.... ...for model training and inference.* **Feature Management:** Help...
Performance
H1b
Remote work
Solventum
New York, NY
1 day ago
Machine Learning Performance Engineer
$200k
...seeking a Machine Learning Performance Engineer to join our team, focusing on... ...infrastructure, training, and inference challenges to advance our... ...What you'll do: Build scalable and robust training and... ...-level GPU programming with CUDA, including Tensor Cores, cooperative...
Performance
Work at office
Optiver
New York, NY
5 days ago
Senior Machine Learning (ML) Engineer
...Machine Learning / Software Engineer Dyania Health is a... ...mission. As a senior ML engineer at Dyania,... ..., build, and deploy scalable ML-driven systems that... ...optimization, deployment, and inference at scale. Architect... ...model and system performance; communicate findings...
Performance
Internship
Local area
Remote work
Flexible hours
Shift work
HealthX Ventures
Jersey City, NJ
3 days ago
Staff ML Infra Engineer: Low-Latency, Scalable Platform
Tubi Tv is seeking a Software Engineer specializing in ML Infra & Distributed Systems to enhance their... ...and ML teams, you will design high-performance, low-latency systems that power... ...Ideal candidates have experience in scalable system design and an enthusiasm for...
Performance
Tubi Tv
New York, NY
3 days ago
Machine Learning Research Engineer
$200k
...Machine Learning Research Engineer to join our team,... ...infrastructure, training, and inference challenges to advance... ...Build scalable and robust training and... ...in a supportive, high-performing environment alongside... ...or other accelerators (CUDA, Triton, Pallas, etc.)...
Performance
Work at office
Optiver
New York, NY
5 days ago
ML Inference Performance Engineer
The Consensus is looking for a Software Engineer focused on ML performance to join our team in New York. This role involves working with cutting-edge AI technologies and optimizing ML models, particularly large language models (LLMs). Ideal candidates will possess strong...
Performance
Flexible hours
The Consensus
New York, NY
5 days ago
Senior Machine Learning Engineer (Inference Platform)
$200k - $250k
...we’re building the top-performing AI Shopping Agent that... ..., and trust. Our ML models power the core... ...experienced Senior MLOps Engineer to take ownership of how... ...- for a custom-built inference platform powering a live... ..., cost-efficient, and scalable, partnering with...
Performance
Remote work
Flexible hours
Wizard
New York, NY
3 days ago
Machine Learning Engineer, Images
$200k - $265k
...Senior Machine Learning Engineer on the AI Image... ...machine learning and scalable ML infrastructure will be... ...responsiveness to prompting, inference time, and... ...experiments to benchmark model performance, tracking quality metrics... ...ComfyUI, TensorRT, and CUDA. Experience building...
Performance
Work at office
Cantina
New York, NY
2 days ago
ML Ops Engineer (AI)
...platform helps contractors, engineering firms, and utilities... ...of our training and inference pipelines, fortifying... ...reliable, high-performing, and secure actionable... ...: Design and maintain scalable architectures for serving... ...packaging and scaling ML applications. Infrastructure...
Performance
For contractors
SewerAI Corporation
New York, NY
3 days ago
Machine Learning Engineer - Inference / Serving
Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer - Inference... ...Today, we are focused on bringing the performance of closed‑web user acquisition to the... ...and CTV products. This is an applied ML systems role—equal parts engineering...
Performance
Full time
Remote work
Yobi AI
New York, NY
1 day ago
ML Model Serving Engineer - High-Performance Inference
$175k - $280k
...layer, integrating LLM, speech, and vision models. The ideal candidate has significant experience in systems programming and performance engineering, aiming to improve high-throughput, low-latency serving. Join a team dedicated to pioneering advancements in voice agents...
Performance
Sesame
New York, NY
8 days ago
AI/ML Engineer
...AI/ML Engineer We are seeking a highly skilled Senior Developer... ...engineering expertise in building scalable data systems and good... ...and consistency. Ensure performance and stability of LLM-based components... ...LLMOps tools and scalable inference strategies. Prior work...
Performance
Local area
RIT Solutions
New York, NY
2 days ago
ML Ops Engineer, Machine Learning & AI
$110k - $130k
...: Machine Learning (ML) at the New York Times... ...York Times real-time ML inference models, including both... ...end, our partners are engineering systems that call... ...deploying ML models as scalable, low-latency, and highly... ...data drift, and model performance degradation. *...
Performance
Full time
Local area
Flexible hours
The New York Times
New York, NY
4 hours ago
ML Research Engineer
$160k - $200k
...layer that can accurately and scalably synthesize information from... ...We’re hiring an exceptional ML Engineer to join our team (Boston or... ...efficient, secure, reliable, and performant ML pipelines and... ...systems (design, training, inference, deployment, and monitoring;...
Performance
Work at office
Verana Health
New York, NY
5 days ago
Staff ML Engineer - ML Infrastructure
$200.2k - $357.5k
...operations. We’re hiring a Staff / Senior Staff... ...Infrastructure Engineer to lead the design... ...of our end-to-end ML platform powering... ...batch and online inference, and edge deployment... ...and operate scalable online and batch inference... ...tied to performance, subject to plan terms...
Performance
Full time
Work at office
Remote work
Flexible hours
Samsara
New York, NY
3 days ago
Lead AI/ML Engineer
$170k - $190k
...interruption handling, streaming inference, and audio quality, and... ...translate these into scalable, enterprise-grade... ...production Improve model performance and inference workflows... ...the team, mentoring engineers and promoting best practices in ML engineering Partner with...
Performance
Remote work
ASAPP
New York, NY
2 days ago
Senior ML Engineer
...needs. Collaborate with data scientists and software engineers to design and implement scalable and efficient solutions. Clean, preprocess, and analyze... ...into production environments and monitor their performance. Continuously improve model accuracy and performance...
Performance
Resolve Tech Solutions
New York, NY
2 days ago
Machine Learning Engineer
$160k - $230k
...Core Linux · Low Latency · Network Engineering AI/ML Solutions Architect – Distributed Training... ...training, multi-GPU systems, and scalable AI inference infrastructure. You'll work directly... ..., you'll: Design and deploy high-performance ML pipelines across hundreds/thousands...
Performance
Full time
Remote work
Doghouse Recruitment
New York, NY
3 days ago
Machine Learning Engineer
...Machine Learning Engineer ExaCare Inc – New York, New... ...processes that enable ML to move from research... ...turn their work into scalable, maintainable, and cost... ...support model training and inference Build tooling and... ...for monitoring model performance , system reliability,...
Performance
Flexible hours
ExaCare Inc
New York, NY
1 day ago
Machine Learning Engineer
$200k - $300k
...Hiring: Machine Learning Engineer II (Autonomous... ...mission by developing scalable, production-grade models... ...to building end-to-end ML systems for large-scale... ...teams to ensure model performance in simulation and on-vehicle... ...The TalentHaus by 2x Inferred from the description...
Performance
Full time
Immediate start
Remote work
The TalentHaus
New York, NY
3 days ago
Senior Machine Learning Engineer
$153k - $198k
...Senior Machine Learning Engineer, you will own the end to end ML lifecycle at Button, from... ...for latency, scalability, cost efficiency, reproducibility... ...workflows, model deployment, inference services, monitoring,... ...services with clear performance, reliability, and latency...
Performance
Local area
Button
New York, NY
3 days ago
Machine Learning Engineer
$210k - $250k
...layer that can accurately and scalably synthesize information from... ...We’re hiring an exceptional ML Engineer to join our team (Boston or... ...models (methods to detect drift/performance degradation; develop... ...systems (design, training, inference, deployment, and monitoring;...
Performance
Work at office
Verana Health
New York, NY
5 days ago
Machine Learning Engineer
$150k - $215k
...team combining world‑class engineers with veteran strategists who... ...augmentation at scale. Our ML team builds the services and... ...tuning models to deploying high‑performance inference services, and we operate... ...driving the development of scalable ML services for enrichment....
Performance
Permanent employment
Contract work
For contractors
For subcontractor
Work at office
Remote work
Vannevar Labs
New York, NY
3 days ago
Machine Learning Performance Engineer
...We are looking for an engineer with experience in low-level... ...to join our growing ML team. Machine learning... ...here is optimising the performance of our models - both training and inference. We care about efficient... ...straightforward CUDA, but the interesting part...
Performance
Jane Street
New York, NY
4 days ago
ML Engineer
...Windmill is building the future of performance. Windmill is the first context graph... ...Deployment : Design, build, and deploy scalable machine learning models to enhance product... ...closely with data scientists, software engineers, and founders to integrate machine...
Performance
Work at office
Relocation
WindMill
New York, NY
5 days ago
Senior Machine Learning Engineer
...Senior Machine Learning Engineer Disney... ...distributed data and ML infrastructure that supports... ...adjacent services such as inference inputs, feature APIs,... ...layers. Contribute to scalable service patterns including... ...system availability, performance, and cost efficiency....
Performance
Worldwide
Walt Disney Company
New York, NY
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Performance Engineer — Scalable Inference & CUDA. Be the first to apply!