Staff ML Performance Engineer — Scalable Inference & CUDA
Modal
A leading AI infrastructure company based in New York is seeking experienced engineers to enhance the performance of ML systems and contribute to open-source projects. Ideal candidates will have over 5 years of experience in writing high-quality code and familiarity with Nvidia GPU architecture and ML frameworks. This role offers opportunities for significant growth within a fast-growing team and requires in-person collaboration in NYC, San Francisco, or Stockholm. #J-18808-Ljbffr Modal
- ...jobs, and serve low-latency inference. We have thousands of... ...medalists, and experienced engineering and product leaders with decades... ...with experience in making ML systems performant at scale. If you are interested... ...GPU architecture and CUDA. Experience with ML performance...Performance
- A leading Behavioral AI company is seeking a Machine Learning Engineer focused on inference and serving. In this role, you will design and optimize systems to operationalize AI models. The ideal candidate has deep expertise in model deployment, a strong low-latency mindset...SuggestedRemote work
$128.7k - $261.3k
...Team The Model Deployment & Inference Solutions team in GM AV... ...mission is two-fold: build the ML deployment platform that... ...automating workflows currently performed manually by engineers. Build the developer... ...at the integration level (CUDA-aware Python, TensorRT, Triton...PerformanceFlexible hoursShift work- ...Reddit, Inc. is seeking a Staff Machine Learning Engineer to lead the development of a large-scale ML Inference Platform. Responsibilities include designing cloud-based ML... ...Kubernetes and ensuring reliable, low-latency performance. Candidates should have 7+ years of...Performance
- ...Oefentherapie is seeking a skilled Machine Learning Engineer with expertise in developing scalable and efficient ML systems. The position emphasizes collaboration... ...background in designing and architecting high-performance models. This role offers competitive salary and...Performance
- ...A leading cloud technology company in the United States seeks an ML Performance Engineer Principal Lead to optimize inference performance across its platforms. The role involves evaluating techniques like quantization and hardware-aware scheduling. Ideal candidates will...Performance
- ...About the Role As an ML Research Engineer at Maple, you'll be a... ...automated systems to monitor performance, detect anomalies,... ...optimized production inference. Lead evaluations,... ...robustness and scalability. Balance research innovation... ...experience with CUDA/Triton preferred. Proven...PerformanceWork at officeLocal area
- ...join their Technology team. The role involves designing high-performance infrastructure for generative AI and machine learning workloads... ...should have a relevant degree and 3-7 years of experience in scalable systems. The position offers competitive compensation, health...Performance
$128.7k - $261.3k
...seeks a skilled professional to develop its ML deployment platform within the... ...deployment from training to on-vehicle inference and enhancing developer experience through... ...from $128,700 to $261,300 with additional performance bonuses and a comprehensive benefits package...Performance- ...Ocient is seeking a Senior Software Engineer - Machine Learning & Geospatial to enhance its ML capabilities. This fully remote position focuses on identifying... ...to popular ML frameworks, ensuring efficient performance at scale. The ideal candidate has over 5 years of...PerformanceRemote work
- ...Summary: We are seeking an ML Engineer to bridge the gap between... ...role focuses on building scalable, reliable systems to serve machine... ...environments, ensuring high performance and operational efficiency.... ...models Optimize model inference for speed, scalability, and...Performance
$200k - $250k
...we’re building the top-performing AI Shopping Agent that... ..., and trust. Our ML models power the core... ...experienced Senior MLOps Engineer to take ownership of how... ...– for a custom-built inference platform powering a live... ..., cost-efficient, and scalable, partnering with...PerformanceRemote workFlexible hours$128.7k - $261.3k
...export, kernel development, and performance engineering so that every cycle on our... ...models into fast, reliable inference across GPUs powering GM’s... ...and diving into MLIR/ONNX and CUDA/TensorRT internals. We value... ...reliable, and effortless for ML engineers across the AV...PerformanceFlexible hours- ...platform helps contractors, engineering firms, and utilities... ...of our training and inference pipelines, fortifying... ...reliable, high-performing, and secure actionable... ...: Design and maintain scalable architectures for serving... ...packaging and scaling ML applications. Infrastructure...PerformanceFor contractors
- ...help healthcare professionals perform at their best. At Solventum,... ....**Job Description:****ML Engineer****3M Health Care is now Solventum... ...AI services are secure and scalable.**Key Responsibilities****1.... ...for model training and inference.* **Feature Management:** Help...PerformanceH1bRemote work
$200k
...seeking a Machine Learning Performance Engineer to join our team, focusing on... ...infrastructure, training, and inference challenges to advance our... ...What you'll do: Build scalable and robust training and... ...-level GPU programming with CUDA, including Tensor Cores, cooperative...PerformanceWork at office$128.7k - $261.3k
...model export, kernel development, and performance engineering so that every cycle on our accelerators... ...sit at the heart of our on‑vehicle ML inference for ADAS and autonomous driving. We own... ...implement, benchmark, and iterate on CUDA‑based kernels and custom operators to...PerformanceFlexible hours- ...ML Engineer Stamford, Connecticut, United States About the Job Our client... ...preprocessing, model training, deployment, inference, and monitoring in production... ...ML infrastructure and processes for scalability and performance. Qualifications ~ Bachelor...PerformanceFull time
$200k - $265k
...Senior Machine Learning Engineer on the AI Image... ...machine learning and scalable ML infrastructure will be... ...responsiveness to prompting, inference time, and... ...experiments to benchmark model performance, tracking quality metrics... ...ComfyUI, TensorRT, and CUDA. Experience building...PerformanceWork at office- ...Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer... ...Today, we are focused on bringing the performance of closed‑web user acquisition to the open... ...CTV products. This is an applied ML systems role—equal parts engineering depth...PerformanceFull timeRemote work
$110k - $130k
...Machine Learning (ML) at the New York Times... ...York Times real-time ML inference models, including both... ...end, our partners are engineering systems that call these... ...deploying ML models as scalable, low-latency, and highly... ...data drift, and model performance degradation....PerformanceLocal areaFlexible hours- A Behavioral AI company is seeking a Machine Learning Engineer to design and optimize systems for bringing their models to life. The role involves ensuring ML models are efficient and reliable, requiring experience in model deployment and robust coding skills. Candidates...Remote work
- ...help healthcare professionals perform at their best. At Solventum,... ...Description:****Principal ML Engineer****3M Health Care is now Solventum... ...performance, and automated scalability** over hype.While many focus... ...experimentation.* **Inference at Scale:** Architect high-performance...PerformanceH1bRemote work
$192.4k - $357.4k
...an experienced Head of Engineering, reporting to the VP... ...technical expertise in AI/ML engineering, large-... ...maintaining a robust, scalable, and secure ML platforms... ..., deployment, and inference for frontier research.... ...fine-tuning, and high-performance inference.) Preferred...PerformanceLocal areaWorldwideRelocation package- ...Ipro Networks Pte. Ltd. is seeking a Principal AI/ML Engineer to lead the development of AI infrastructure and support fraud... ...development. You will guide engineering teams, ensuring scalability and performance in a dynamic setting, while representing the company externally...PerformanceRemote work
$200k
...Machine Learning Research Engineer to join our team,... ...infrastructure, training, and inference challenges to advance... ...Build scalable and robust training and... ...in a supportive, high-performing environment alongside... ...or other accelerators (CUDA, Triton, Pallas, etc.)...PerformanceWork at office$200.2k - $357.5k
...operations. We’re hiring a Staff / Senior Staff... ...Infrastructure Engineer to lead the design... ...of our end‑to‑end ML platform powering... ...driving scalable innovation for customers... ...experimentation, batch/online inference, edge) used by... ...tied to performance, subject to plan terms...PerformanceFull timeWork at officeRemote workFlexible hours- ...looking to bring on an experienced Data Engineer to join Eight Sleep’s machine learning team... ...This role involves monitoring production ML systems, building tools and pipelines for... ...machine learning systems to ensure performance and reliability Develop big data ETL pipelines...PerformanceFull timeSleeping nightsFlexible hoursNight shift
$300k
...Staff + Sr. Software Engineer, Inference San Francisco, CA | New York City, NY | Seattle, WA About Anthropic Anthropic's mission is to create... ...research by giving our scientists the high-performance inference infrastructure they need to develop next-generation...PerformanceWork at officeWorldwideVisa sponsorshipFlexible hours- ...Description As the first ML Ops Engineer at Tennr, you'll play... ...training and inference pipelines that can... ...is powered by robust, scalable, and efficiently deployed... ...systems to enhance performance and efficiency.... ...inference) involving CUDA profiling, memory optimization...PerformanceWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff ML Performance Engineer — Scalable Inference & CUDA. Be the first to apply!
- assistant engineering manager New York, NY
- staff data engineer New York, NY
- staff design engineer New York, NY
- engineering aide New York, NY
- software engineer staff New York, NY
- staff devops engineer New York, NY
- assistant chief engineer New York, NY
- staff automation engineer New York, NY
- project engineer assistant project manager New York, NY
- technology administrator New York, NY

