Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Engineer- Inference Optimization | Experienced Hire

Susquehanna International Group LLP

Overview We are looking for a Machine Learning Engineer focused on low-latency inference optimization to help build, tune, and productionize high-performance model serving systems. This role sits at the intersection of machine learning, systems engineering, and GPU performance. You will work on inference workloads where latency, throughput, reliability, and hardware efficiency all matter, and where a deep understanding of modern inference runtimes can meaningfully improve production outcomes. You will work closely with quantitative researchers and engineers to understand model structure, identify inference bottlenecks, and turn research ideas into efficient production systems. The work may involve other types of models, but focuses on transformer-style architectures, and structured inference workloads. You will evaluate and tune frameworks and related serving or compilation systems, while also reasoning about GPU execution, memory layout, batching strategies, precision tradeoffs, and end-to-end latency. What you'll do Design, build, and optimize low-latency inference systems for production machine learning workloads. Profile model inference pipelines across model execution, runtime configuration, batching, memory movement, serialization, networking, and I/O. Evaluate, integrate, and tune inference runtime systems. Improve latency, throughput, GPU utilization, for production inference workloads. Build and support benchmarking and profiling tools to compare model variants, hardware targets, runtime configurations, and deployment strategies. Debug performance issues involving GPU memory, compute saturation, kernel behavior, CPU/GPU coordination, data movement, and serving-layer overhead. Help shape model and system design choices so that research models are efficient to deploy under real latency constraints. Where necessary, collaborate with lower-level systems or GPU specialists on custom operators, kernel-level optimization, or hardware-specific performance work. What we’re looking for Experience deploying, optimizing, or operating machine learning inference workloads in production or production-like environments. Programming experience in Python, Java, C# etc. and at least one systems language such as C, C++, Rust, or Go Solid understanding of modern ML frameworks such as PyTorch, including model execution, export, tracing, compilation, and performance profiling. Ability to reason about latency, throughput, batching, memory use, GPU utilization, and reliability under real workloads. Strong practical judgment around tradeoffs between model quality, latency, throughput, implementation complexity, and maintainability. Preferred qualifications Experience optimizing inference for latency-sensitive or high-throughput applications. Experience with model optimization techniques such as quantization, pruning, distillation, operator fusion, graph lowering, custom operators, or model compilation. Exposure to CUDA, Triton language, ROCm, PTX, CuTe, CUTLASS, FlashInfer, or similar low-level GPU programming tools. Experience running inference workloads on Kubernetes or GPU clusters, including scheduling, autoscaling, observability, and resource management. Background in mathematics, physics, computer science, engineering, statistics, quantitative finance, or another technical field. Demonstrated ability to improve real-world inference performance beyond a baseline framework implementation. #J-18808-Ljbffr

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Machine Learning Engineer- Inference Optimization | Experienced Hire in New York, NY vacancy
  •  ...Machine Learning Engineer - Inference / Serving Join to apply for the Machine Learning Engineer - Inference...  ...serving at Yobi, you’ll design, optimize, and operate the systems that bring...  ...to Python. Operational maturity: experienced with monitoring, drift detection,... 
    Suggested
    Full time
    Remote work

    Yobi AI

    New York, NY
    13 days ago
  • $184.35k - $270.39k

     ...the country. Our Engineering and Analytics...  ...practices that help optimize our success. Our...  ...motivated and experienced Leader of ML and...  ...decision science, machine learning, and generative...  ...AI platform and inference pipelines for...  ...than others. Our hiring team wants to make... 
    Suggested
    Casual work
    Work at office
    Local area
    Remote work
    Work from home

    Credit Acceptance

    New York, NY
    5 days ago
  •  ...Team The Decisioning & Optimization engineering team owns the systems that...  ...for model serving: real-time inference at 1M+ QPS, multi-model...  ...unique culture and environment. Learn more here. Inclusion is...  ...other reason during the hiring process, please send a request... 
    Suggested
    Hourly pay
    Full time
    Immediate start
    Flexible hours
    Shift work

    Netflix Inc

    New York, NY
    6 days ago
  • $200k - $250k

     ...and we’re seeking an experienced Senior MLOps Engineer to take ownership of how our machine learning systems run reliably...  ...monitoring, observability, optimization and scaling – for a custom-built inference platform powering a...  ...does not affect hiring decisions. #J-18808-Ljbffr... 
    Suggested
    Remote work
    Flexible hours

    Wizard

    New York, NY
    6 days ago
  • $151.04k - $234.11k

     ...responsibility. We are looking for experienced ML engineers to join our team of 35+ engineers...  ...PyTorch + HuggingFace for deep learning work. Model inference runs on a mix of FastAPI and Clojure...  ..., transfer learning, and model optimization to improve the accuracy and... 
    Suggested
    Remote work
    Day shift

    Triumph Financial

    New York, NY
    5 days ago
  • A leading Behavioral AI company is seeking a Machine Learning Engineer focused on inference and serving. In this role, you will design and optimize systems to operationalize AI models. The ideal candidate has deep expertise in model deployment, a strong low-latency mindset... 
    Remote work

    Yobi AI

    New York, NY
    13 days ago
  • $128k - $160k

     ...looking for a Senior Machine Learning Engineer to drive...  ...-impact role for an experienced builder who thrives...  ...valuation and search optimization. This key role will...  ...statistical modeling, causal inference, experiment/test design...  ...other relevant KPIs. Hiring Range Tier 1 (... 
    Work experience placement
    Local area

    GOAT Group

    New York, NY
    6 days ago
  • $120k - $240k

     ...simulation software stack for engineering and manufacturing...  ...through AI inference across the entire engineering...  ...new levels of optimization and automation in design...  ...Looking For As a Senior Machine Learning Engineer in Delivery, you are an experienced problem solver and... 
    Work at office
    Remote work
    Flexible hours

    PhysicsX

    New York, NY
    2 days ago
  • $184.05k - $262.93k

     .... We are seeking a Senior Machine Learning Engineer to join the Supply Personalization...  ...focuses on optimizing the volume, timing, and types...  ...machine learning, causal inference, and large scale online experimentation...  ...Python, Java, or Scala. Experienced in Tensorflow or PyTorch... 
    Flexible hours

    Spotify

    New York, NY
    2 days ago
  • $130.2k - $195.3k

     ...and aim to leave a positive mark on culture. Machine Learning Engineer, Presentation and Visual Optimization(45540) Overview: We are seeking a Machine...  ...including SHOWTIME®. ADDITIONAL INFORMATION Hiring Salary Range: $130,200.00 - 195,300.00.... 

    Paramount

    New York, NY
    2 days ago
  • $230k - $322k

     ...Staff Machine Learning Engineer, Ads Auction (Ads Marketplace Quality...  ...are looking for an experienced machine learning...  ...class marketplace and optimizing for users, advertisers...  ...model training, and inference. Proficiency with...  ...promptly after making a hiring decision. For more... 
    For contractors
    Work experience placement
    Work at office
    Remote work
    Home office
    Flexible hours

    Reddit

    New York, NY
    6 days ago
  • $234k - $250k

     ...Principal Machine Learning Engineer, Presentation and Visual Optimization We are seeking a Principal Machine Learning Engineer to lead our Presentation pod. The...  ...outside the workplace. Explore life at Paramount: Hiring salary range: $234,000.00 - $250,000.00. Paramount... 
    Shift work

    Paramount Pictures

    New York, NY
    2 days ago
  •  ...SIG Susquehanna is seeking a Machine Learning Engineer focused on optimizing low-latency inference systems. This role bridges machine learning and systems engineering to enhance model serving efficiency. Ideal candidates will have experience in deploying inference workloads... 

    SIG Susquehanna

    New York, NY
    2 days ago
  •  ...A leading cloud technology company in the United States seeks an ML Performance Engineer Principal Lead to optimize inference performance across its platforms. The role involves evaluating techniques like quantization and hardware-aware scheduling. Ideal candidates will... 

    Akamai

    New York, NY
    6 days ago
  •  ...600k stars on GitHub. About the Role As an Open-Source Machine Learning Engineer, you'll work to improve the open-source machine learning...  ...libraries Familiarity with distributed training, inference optimization, or GPU/accelerator performance work Experience training... 
    Work at office
    Remote work
    Flexible hours

    Hugging Face

    New York, NY
    3 days ago
  • $175k - $250k

     ...Point72 Asset Management, L.P in New York, NY is seeking an experienced ML Engineer to join their Knowledge Graph Intelligence team. You will...  ...design and implement mission-critical infrastructure for machine learning, focusing on data processing, model training, and... 

    Point72 Asset Management, L.P

    New York, NY
    2 days ago
  • $140k - $210k

     ...highly skilled and motivated engineer to join our team. You will...  ...deploying state-of-the-art machine learning solutions to advance our...  ...If you are a passionate and experienced engineer eager to contribute...  ...using cloud-based training and inference pipelines. 5+ years of... 
    Full time
    Work experience placement
    Work at office
    2 days per week

    Treeswift Inc

    New York, NY
    2 days ago
  • $156.77k - $198.27k

     ...Island City-Corp Job Summary Machine Learning Engineers work to deploy end-to-end...  ...(NLP), experiments, and optimization. Hands‑on experience with...  ...Ability to apply Bayesian inference, frequentist statistics, causal...  ...pay rate/range at time of hire for this position in the... 
    Work experience placement
    Local area

    Optimum Corp

    New York, NY
    3 days ago
  •  ...A Behavioral AI company is seeking a Machine Learning Engineer to design and optimize systems for bringing their models to life. The role involves ensuring ML models are efficient and reliable, requiring experience in model deployment and robust coding skills. Candidates... 
    Remote work

    YOBI, LLC

    New York, NY
    2 days ago
  •  ...A leading fintech company in New York is looking for a Machine Learning Engineer to tackle complex credit challenges through innovative solutions...  ...and programming skills in Python. You will develop and optimize algorithms to enhance operational efficiency and drive business... 

    ClarityPay Program Services, LLC

    New York, NY
    2 days ago
  •  ...Indeed, Inc. is seeking a Machine Learning Engineer III to lead the Job Reach team, focusing on optimizing marketplace efficiency through effective machine learning solutions. Candidates should possess at least 8 years of experience in relevant fields with a Bachelor'... 

    Indeed, Inc., c/o CT Corporation (Indeed.com)

    New York, NY
    6 days ago
  • $150k - $215k

     ...combining world‑class engineers with veteran...  ...still. About the Role Machine learning is core to Vannevar's...  ...deploying high‑performance inference services, and we operate...  ...Face, to deploying optimized inference services using...  ...be considered in the hiring process or thereafter... 
    Permanent employment
    Contract work
    For contractors
    For subcontractor
    Work at office
    Remote work

    Vannevar Labs

    New York, NY
    6 days ago
  • $148.9k - $212.72k

     ...Spotify’s personalization engine, powering experiences like...  ...on complex sequencing and optimization problems—balancing what users...  ...business. Our team blends machine learning, backend engineering, and data...  ...engineering You are experienced with production-grade systems... 
    Flexible hours

    Spotify

    New York, NY
    2 days ago
  • $153k - $198k

     ...have a good time doing it. As a Senior Machine Learning Engineer, you will own the end to end ML...  ...training workflows, model deployment, inference services, monitoring, and retraining....  ...scoring, and online inference. Build and optimize machine learning models including... 
    Local area

    Button

    New York, NY
    6 days ago
  •  ...We are looking for a Senior Machine Learning Engineer, MLOps to help operationalize and scale our machine...  ...that support model training and inference Build tooling and processes for monitoring...  ..., and reproducibility of ML systems Optimize ML infrastructure for speed,... 
    Flexible hours

    ExaCare AI

    New York, NY
    2 days ago
  •  ..., we leverage cutting-edge machine learning and AI to solve real-world...  ...are looking for passionate engineers who are eager to design, build...  ...production systems. Optimize model performance, scalability...  .... Collaborate with experienced ML engineers and researchers... 
    Full time
    Remote work

    MH Techin

    New York, NY
    11 days ago
  • $180k - $220k

     ...Fraudulent Activity The Sr. Machine Learning Engineer will join our Applied Data...  ...for real-time performance optimization and machine learning...  ...person for this role should be experienced in crafting...  ...geographic location. Candidates hired to work in other locations... 
    Full time
    Work at office
    Remote work
    Flexible hours
    3 days per week

    Nexxen

    New York, NY
    3 days ago
  •  ...A leading audio streaming service is looking for a Senior Machine Learning Engineer to optimize ad experiences using machine learning algorithms. The ideal candidate will design and implement data-driven solutions while collaborating with cross-functional teams. Key qualifications... 
    Flexible hours

    Spotify

    New York, NY
    2 days ago
  • $50k

     ...assess, onboard, and optimize new AI models through...  ...tens of thousands of engineering hours and improve output...  ...billions of custom inference engines running on...  ...model future. As a machine learning researcher at Not Diamond...  ...to 1 ~ Leadership, hiring, and management... 
    Work at office
    Remote work
    Flexible hours
    Shift work

    Myriad Venture Partners, LLC

    New York, NY
    13 days ago
  •  ...York. About the Role Pangram Labs is hiring strong Machine Learning Engineers at all levels to join our team. In this role, you...  ...infrastructure for multi-GPU LLM training Profiling and optimizing training and inference code Deploy efficient inference pipelines for... 
    Work at office

    Pangram Labs

    New York, NY
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer- Inference Optimization | Experienced Hire. Be the first to apply!