Staff ML Performance Engineer — Scalable Inference & CUDA
Modal
A leading AI infrastructure company based in New York is seeking experienced engineers to enhance the performance of ML systems and contribute to open-source projects. Ideal candidates will have over 5 years of experience in writing high-quality code and familiarity with Nvidia GPU architecture and ML frameworks. This role offers opportunities for significant growth within a fast-growing team and requires in-person collaboration in NYC, San Francisco, or Stockholm. #J-18808-Ljbffr Modal
- ...the Role As an ML Research Engineer at Maple, you'll be... ...automated systems to monitor performance, detect anomalies,... ...optimized production inference. Lead evaluations,... ...robustness and scalability. Balance research... ...optimization experience with CUDA/Triton preferred. ~...PerformanceWork at officeLocal area
- ...the first and founding ML Operations Engineer at Tennr, you’ll play... ...training and inference pipelines that can handle... ...is powered by robust, scalable, and efficiently deployed... ...systems to enhance performance and efficiency.... ...inference) involving CUDA profiling, memory optimization...PerformanceWork at office
- ...and deploy production‑grade ML systems with end‑to‑end... ...model training, deployment, inference, and monitoring in production... ...infrastructure and processes for scalability and performance. Qualifications Bachelor’s... ...experience in ML engineering. Strong programming skills...PerformanceFull time
- ...join their Technology team. The role involves designing high-performance infrastructure for generative AI and machine learning workloads... ...should have a relevant degree and 3-7 years of experience in scalable systems. The position offers competitive compensation, health...Performance
- ...help healthcare professionals perform at their best. At Solventum,... ....**Job Description:****ML Engineer****3M Health Care is now Solventum... ...AI services are secure and scalable.**Key Responsibilities****1.... ...for model training and inference.* **Feature Management:** Help...PerformanceH1bRemote work
$200k
...seeking a Machine Learning Performance Engineer to join our team, focusing on... ...infrastructure, training, and inference challenges to advance our... ...What you'll do: Build scalable and robust training and... ...-level GPU programming with CUDA, including Tensor Cores, cooperative...PerformanceWork at office- ...Machine Learning / Software Engineer Dyania Health is a... ...mission. As a senior ML engineer at Dyania,... ..., build, and deploy scalable ML-driven systems that... ...optimization, deployment, and inference at scale. Architect... ...model and system performance; communicate findings...PerformanceInternshipLocal areaRemote workFlexible hoursShift work
- Tubi Tv is seeking a Software Engineer specializing in ML Infra & Distributed Systems to enhance their... ...and ML teams, you will design high-performance, low-latency systems that power... ...Ideal candidates have experience in scalable system design and an enthusiasm for...Performance
$200k
...Machine Learning Research Engineer to join our team,... ...infrastructure, training, and inference challenges to advance... ...Build scalable and robust training and... ...in a supportive, high-performing environment alongside... ...or other accelerators (CUDA, Triton, Pallas, etc.)...PerformanceWork at office- The Consensus is looking for a Software Engineer focused on ML performance to join our team in New York. This role involves working with cutting-edge AI technologies and optimizing ML models, particularly large language models (LLMs). Ideal candidates will possess strong...PerformanceFlexible hours
$200k - $250k
...we’re building the top-performing AI Shopping Agent that... ..., and trust. Our ML models power the core... ...experienced Senior MLOps Engineer to take ownership of how... ...- for a custom-built inference platform powering a live... ..., cost-efficient, and scalable, partnering with...PerformanceRemote workFlexible hours$200k - $265k
...Senior Machine Learning Engineer on the AI Image... ...machine learning and scalable ML infrastructure will be... ...responsiveness to prompting, inference time, and... ...experiments to benchmark model performance, tracking quality metrics... ...ComfyUI, TensorRT, and CUDA. Experience building...PerformanceWork at office- ...platform helps contractors, engineering firms, and utilities... ...of our training and inference pipelines, fortifying... ...reliable, high-performing, and secure actionable... ...: Design and maintain scalable architectures for serving... ...packaging and scaling ML applications. Infrastructure...PerformanceFor contractors
- Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer - Inference... ...Today, we are focused on bringing the performance of closed‑web user acquisition to the... ...and CTV products. This is an applied ML systems role—equal parts engineering...PerformanceFull timeRemote work
$175k - $280k
...layer, integrating LLM, speech, and vision models. The ideal candidate has significant experience in systems programming and performance engineering, aiming to improve high-throughput, low-latency serving. Join a team dedicated to pioneering advancements in voice agents...Performance- ...AI/ML Engineer We are seeking a highly skilled Senior Developer... ...engineering expertise in building scalable data systems and good... ...and consistency. Ensure performance and stability of LLM-based components... ...LLMOps tools and scalable inference strategies. Prior work...PerformanceLocal area
$110k - $130k
...: Machine Learning (ML) at the New York Times... ...York Times real-time ML inference models, including both... ...end, our partners are engineering systems that call... ...deploying ML models as scalable, low-latency, and highly... ...data drift, and model performance degradation. *...PerformanceFull timeLocal areaFlexible hours$160k - $200k
...layer that can accurately and scalably synthesize information from... ...We’re hiring an exceptional ML Engineer to join our team (Boston or... ...efficient, secure, reliable, and performant ML pipelines and... ...systems (design, training, inference, deployment, and monitoring;...PerformanceWork at office$200.2k - $357.5k
...operations. We’re hiring a Staff / Senior Staff... ...Infrastructure Engineer to lead the design... ...of our end-to-end ML platform powering... ...batch and online inference, and edge deployment... ...and operate scalable online and batch inference... ...tied to performance, subject to plan terms...PerformanceFull timeWork at officeRemote workFlexible hours$170k - $190k
...interruption handling, streaming inference, and audio quality, and... ...translate these into scalable, enterprise-grade... ...production Improve model performance and inference workflows... ...the team, mentoring engineers and promoting best practices in ML engineering Partner with...PerformanceRemote work- ...needs. Collaborate with data scientists and software engineers to design and implement scalable and efficient solutions. Clean, preprocess, and analyze... ...into production environments and monitor their performance. Continuously improve model accuracy and performance...Performance
$160k - $230k
...Core Linux · Low Latency · Network Engineering AI/ML Solutions Architect – Distributed Training... ...training, multi-GPU systems, and scalable AI inference infrastructure. You'll work directly... ..., you'll: Design and deploy high-performance ML pipelines across hundreds/thousands...PerformanceFull timeRemote work- ...Machine Learning Engineer ExaCare Inc – New York, New... ...processes that enable ML to move from research... ...turn their work into scalable, maintainable, and cost... ...support model training and inference Build tooling and... ...for monitoring model performance , system reliability,...PerformanceFlexible hours
$200k - $300k
...Hiring: Machine Learning Engineer II (Autonomous... ...mission by developing scalable, production-grade models... ...to building end-to-end ML systems for large-scale... ...teams to ensure model performance in simulation and on-vehicle... ...The TalentHaus by 2x Inferred from the description...PerformanceFull timeImmediate startRemote work$153k - $198k
...Senior Machine Learning Engineer, you will own the end to end ML lifecycle at Button, from... ...for latency, scalability, cost efficiency, reproducibility... ...workflows, model deployment, inference services, monitoring,... ...services with clear performance, reliability, and latency...PerformanceLocal area$210k - $250k
...layer that can accurately and scalably synthesize information from... ...We’re hiring an exceptional ML Engineer to join our team (Boston or... ...models (methods to detect drift/performance degradation; develop... ...systems (design, training, inference, deployment, and monitoring;...PerformanceWork at office$150k - $215k
...team combining world‑class engineers with veteran strategists who... ...augmentation at scale. Our ML team builds the services and... ...tuning models to deploying high‑performance inference services, and we operate... ...driving the development of scalable ML services for enrichment....PerformancePermanent employmentContract workFor contractorsFor subcontractorWork at officeRemote work- ...We are looking for an engineer with experience in low-level... ...to join our growing ML team. Machine learning... ...here is optimising the performance of our models - both training and inference. We care about efficient... ...straightforward CUDA, but the interesting part...Performance
- ...Windmill is building the future of performance. Windmill is the first context graph... ...Deployment : Design, build, and deploy scalable machine learning models to enhance product... ...closely with data scientists, software engineers, and founders to integrate machine...PerformanceWork at officeRelocation
- ...Senior Machine Learning Engineer Disney... ...distributed data and ML infrastructure that supports... ...adjacent services such as inference inputs, feature APIs,... ...layers. Contribute to scalable service patterns including... ...system availability, performance, and cost efficiency....PerformanceWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff ML Performance Engineer — Scalable Inference & CUDA. Be the first to apply!
- assistant civil engineer New York, NY
- engineering aide New York, NY
- assistant engineering manager New York, NY
- project engineer assistant project manager New York, NY
- senior staff systems engineer New York, NY
- staff automation engineer New York, NY
- staff design engineer New York, NY
- staff security engineer New York, NY
- staff engineer New York, NY
- staff data engineer New York, NY

