Machine Learning Engineer- Inference Optimization | Experienced Hire
Susquehanna International Group LLP
Overview We are looking for a Machine Learning Engineer focused on low-latency inference optimization to help build, tune, and productionize high-performance model serving systems. This role sits at the intersection of machine learning, systems engineering, and GPU performance. You will work on inference workloads where latency, throughput, reliability, and hardware efficiency all matter, and where a deep understanding of modern inference runtimes can meaningfully improve production outcomes. You will work closely with quantitative researchers and engineers to understand model structure, identify inference bottlenecks, and turn research ideas into efficient production systems. The work may involve other types of models, but focuses on transformer-style architectures, and structured inference workloads. You will evaluate and tune frameworks and related serving or compilation systems, while also reasoning about GPU execution, memory layout, batching strategies, precision tradeoffs, and end-to-end latency. What you'll do Design, build, and optimize low-latency inference systems for production machine learning workloads. Profile model inference pipelines across model execution, runtime configuration, batching, memory movement, serialization, networking, and I/O. Evaluate, integrate, and tune inference runtime systems. Improve latency, throughput, GPU utilization, for production inference workloads. Build and support benchmarking and profiling tools to compare model variants, hardware targets, runtime configurations, and deployment strategies. Debug performance issues involving GPU memory, compute saturation, kernel behavior, CPU/GPU coordination, data movement, and serving-layer overhead. Help shape model and system design choices so that research models are efficient to deploy under real latency constraints. Where necessary, collaborate with lower-level systems or GPU specialists on custom operators, kernel-level optimization, or hardware-specific performance work. What we’re looking for Experience deploying, optimizing, or operating machine learning inference workloads in production or production-like environments. Programming experience in Python, Java, C# etc. and at least one systems language such as C, C++, Rust, or Go Solid understanding of modern ML frameworks such as PyTorch, including model execution, export, tracing, compilation, and performance profiling. Ability to reason about latency, throughput, batching, memory use, GPU utilization, and reliability under real workloads. Strong practical judgment around tradeoffs between model quality, latency, throughput, implementation complexity, and maintainability. Preferred qualifications Experience optimizing inference for latency-sensitive or high-throughput applications. Experience with model optimization techniques such as quantization, pruning, distillation, operator fusion, graph lowering, custom operators, or model compilation. Exposure to CUDA, Triton language, ROCm, PTX, CuTe, CUTLASS, FlashInfer, or similar low-level GPU programming tools. Experience running inference workloads on Kubernetes or GPU clusters, including scheduling, autoscaling, observability, and resource management. Background in mathematics, physics, computer science, engineering, statistics, quantitative finance, or another technical field. Demonstrated ability to improve real-world inference performance beyond a baseline framework implementation. #J-18808-Ljbffr
- ...Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer - Inference... ...serving at Yobi, you’ll design, optimize, and operate the systems that bring... ...to Python. Operational maturity: experienced with monitoring, drift detection,...SuggestedFull timeRemote work
$184.35k - $270.39k
...the country. Our Engineering and Analytics... ...practices that help optimize our success. Our... ...motivated and experienced Leader of ML and... ...decision science, machine learning, and generative... ...AI platform and inference pipelines for... ...than others. Our hiring team wants to make...SuggestedCasual workWork at officeLocal areaRemote workWork from home- ...Team The Decisioning & Optimization engineering team owns the systems that... ...for model serving: real-time inference at 1M+ QPS, multi-model... ...unique culture and environment. Learn more here. Inclusion is... ...other reason during the hiring process, please send a request...SuggestedHourly payFull timeImmediate startFlexible hoursShift work
$200k - $250k
...and we’re seeking an experienced Senior MLOps Engineer to take ownership of how our machine learning systems run reliably... ...monitoring, observability, optimization and scaling – for a custom-built inference platform powering a... ...does not affect hiring decisions. #J-18808-Ljbffr...SuggestedRemote workFlexible hours$151.04k - $234.11k
...responsibility. We are looking for experienced ML engineers to join our team of 35+ engineers... ...PyTorch + HuggingFace for deep learning work. Model inference runs on a mix of FastAPI and Clojure... ..., transfer learning, and model optimization to improve the accuracy and...SuggestedRemote workDay shift- A leading Behavioral AI company is seeking a Machine Learning Engineer focused on inference and serving. In this role, you will design and optimize systems to operationalize AI models. The ideal candidate has deep expertise in model deployment, a strong low-latency mindset...Remote work
$128k - $160k
...looking for a Senior Machine Learning Engineer to drive... ...-impact role for an experienced builder who thrives... ...valuation and search optimization. This key role will... ...statistical modeling, causal inference, experiment/test design... ...other relevant KPIs. Hiring Range Tier 1 (...Work experience placementLocal area$120k - $240k
...simulation software stack for engineering and manufacturing... ...through AI inference across the entire engineering... ...new levels of optimization and automation in design... ...Looking For As a Senior Machine Learning Engineer in Delivery, you are an experienced problem solver and...Work at officeRemote workFlexible hours$184.05k - $262.93k
.... We are seeking a Senior Machine Learning Engineer to join the Supply Personalization... ...focuses on optimizing the volume, timing, and types... ...machine learning, causal inference, and large scale online experimentation... ...Python, Java, or Scala. Experienced in Tensorflow or PyTorch...Flexible hours$130.2k - $195.3k
...and aim to leave a positive mark on culture. Machine Learning Engineer, Presentation and Visual Optimization(45540) Overview: We are seeking a Machine... ...including SHOWTIME®. ADDITIONAL INFORMATION Hiring Salary Range: $130,200.00 - 195,300.00....$230k - $322k
...Staff Machine Learning Engineer, Ads Auction (Ads Marketplace Quality... ...are looking for an experienced machine learning... ...class marketplace and optimizing for users, advertisers... ...model training, and inference. Proficiency with... ...promptly after making a hiring decision. For more...For contractorsWork experience placementWork at officeRemote workHome officeFlexible hours$234k - $250k
...Principal Machine Learning Engineer, Presentation and Visual Optimization We are seeking a Principal Machine Learning Engineer to lead our Presentation pod. The... ...outside the workplace. Explore life at Paramount: Hiring salary range: $234,000.00 - $250,000.00. Paramount...Shift work- ...SIG Susquehanna is seeking a Machine Learning Engineer focused on optimizing low-latency inference systems. This role bridges machine learning and systems engineering to enhance model serving efficiency. Ideal candidates will have experience in deploying inference workloads...
- ...A leading cloud technology company in the United States seeks an ML Performance Engineer Principal Lead to optimize inference performance across its platforms. The role involves evaluating techniques like quantization and hardware-aware scheduling. Ideal candidates will...
- ...600k stars on GitHub. About the Role As an Open-Source Machine Learning Engineer, you'll work to improve the open-source machine learning... ...libraries Familiarity with distributed training, inference optimization, or GPU/accelerator performance work Experience training...Work at officeRemote workFlexible hours
$175k - $250k
...Point72 Asset Management, L.P in New York, NY is seeking an experienced ML Engineer to join their Knowledge Graph Intelligence team. You will... ...design and implement mission-critical infrastructure for machine learning, focusing on data processing, model training, and...$140k - $210k
...highly skilled and motivated engineer to join our team. You will... ...deploying state-of-the-art machine learning solutions to advance our... ...If you are a passionate and experienced engineer eager to contribute... ...using cloud-based training and inference pipelines. 5+ years of...Full timeWork experience placementWork at office2 days per week$156.77k - $198.27k
...Island City-Corp Job Summary Machine Learning Engineers work to deploy end-to-end... ...(NLP), experiments, and optimization. Hands‑on experience with... ...Ability to apply Bayesian inference, frequentist statistics, causal... ...pay rate/range at time of hire for this position in the...Work experience placementLocal area- ...A Behavioral AI company is seeking a Machine Learning Engineer to design and optimize systems for bringing their models to life. The role involves ensuring ML models are efficient and reliable, requiring experience in model deployment and robust coding skills. Candidates...Remote work
- ...A leading fintech company in New York is looking for a Machine Learning Engineer to tackle complex credit challenges through innovative solutions... ...and programming skills in Python. You will develop and optimize algorithms to enhance operational efficiency and drive business...
- ...Indeed, Inc. is seeking a Machine Learning Engineer III to lead the Job Reach team, focusing on optimizing marketplace efficiency through effective machine learning solutions. Candidates should possess at least 8 years of experience in relevant fields with a Bachelor'...
$150k - $215k
...combining world‑class engineers with veteran... ...still. About the Role Machine learning is core to Vannevar's... ...deploying high‑performance inference services, and we operate... ...Face, to deploying optimized inference services using... ...be considered in the hiring process or thereafter...Permanent employmentContract workFor contractorsFor subcontractorWork at officeRemote work$148.9k - $212.72k
...Spotify’s personalization engine, powering experiences like... ...on complex sequencing and optimization problems—balancing what users... ...business. Our team blends machine learning, backend engineering, and data... ...engineering You are experienced with production-grade systems...Flexible hours$153k - $198k
...have a good time doing it. As a Senior Machine Learning Engineer, you will own the end to end ML... ...training workflows, model deployment, inference services, monitoring, and retraining.... ...scoring, and online inference. Build and optimize machine learning models including...Local area- ...We are looking for a Senior Machine Learning Engineer, MLOps to help operationalize and scale our machine... ...that support model training and inference Build tooling and processes for monitoring... ..., and reproducibility of ML systems Optimize ML infrastructure for speed,...Flexible hours
- ..., we leverage cutting-edge machine learning and AI to solve real-world... ...are looking for passionate engineers who are eager to design, build... ...production systems. Optimize model performance, scalability... .... Collaborate with experienced ML engineers and researchers...Full timeRemote work
$180k - $220k
...Fraudulent Activity The Sr. Machine Learning Engineer will join our Applied Data... ...for real-time performance optimization and machine learning... ...person for this role should be experienced in crafting... ...geographic location. Candidates hired to work in other locations...Full timeWork at officeRemote workFlexible hours3 days per week- ...A leading audio streaming service is looking for a Senior Machine Learning Engineer to optimize ad experiences using machine learning algorithms. The ideal candidate will design and implement data-driven solutions while collaborating with cross-functional teams. Key qualifications...Flexible hours
$50k
...assess, onboard, and optimize new AI models through... ...tens of thousands of engineering hours and improve output... ...billions of custom inference engines running on... ...model future. As a machine learning researcher at Not Diamond... ...to 1 ~ Leadership, hiring, and management...Work at officeRemote workFlexible hoursShift work- ...York. About the Role Pangram Labs is hiring strong Machine Learning Engineers at all levels to join our team. In this role, you... ...infrastructure for multi-GPU LLM training Profiling and optimizing training and inference code Deploy efficient inference pipelines for...Work at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Engineer- Inference Optimization | Experienced Hire. Be the first to apply!
- machine learning software engineer New York, NY
- ai ml engineer New York, NY
- graduate machine learning engineer New York, NY
- computer vision machine learning engineer New York, NY
- machine learning engineer New York, NY
- senior ml engineer New York, NY
- junior machine learning research engineer New York, NY
- machine learning ai engineer New York, NY
- data scientist machine learning engineer New York, NY
- machine learning intern New York, NY

