Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff ML Inference Engineer — Model Efficiency (Remote)

Jaide Health

Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across model execution. You'll work with advanced performance techniques such as GPU/CUDA optimizations and collaborate closely with modeling and systems teams. Ideal candidates will have over 5 years of experience in high-performance coding, plus strong skills in C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity and inclusive work culture is celebrated. #J-18808-Ljbffr Jaide Health

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Staff ML Inference Engineer — Model Efficiency (Remote) in San Francisco, CA vacancy
  •  ...Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating...  ...environment. This position offers a remote-friendly work model, a competitive salary,... 
    Remote work

    Cohere

    San Francisco, CA
    1 day ago
  •  ...cutting-edge foundation AI models and end-to-end products that...  ...is a team of researchers, engineers, designers, and more, who are...  ...systems and optimize audio inference serving efficiency using innovative techniques...  ...and London. We embrace a remote-friendly environment, and as... 
    Remote work
    Work at office

    Cohere

    United States
    3 days ago
  •  ...ML Infrastructure Engineer, Model Inference As an ML Infrastructure Engineer, Model Inference at Abridge, you'll play a pivotal role in building and...  ...will be instrumental in enhancing the scalability, efficiency, and performance of our AI-driven solutions. You will... 
    Remote work
    Hourly pay
    Full time
    Flexible hours

    Abridge

    United States
    12 hours ago
  • Cohere is seeking an engineering professional in New York to develop and optimize audio machine...  ...with cross-functional teams to improve audio model metrics, addressing latency and throughput while ensuring real-time audio inference integration. The ideal candidate will... 
    Remote job

    Cohere

    New York, NY
    2 days ago
  •  ...100x better job search engine: fast, comprehensive, honest...  ...looking for a founding ML engineer who can help...  ...powerful AI and ML models into fast, reliable production...  ...models, optimizing inference latency and throughput,...  ...sure our models run efficiently in production. This is... 
    Suggested
    Relocation package

    HiringCafe

    Cupertino, CA
    1 day ago
  •  ...Model Efficiency Team Engineer Cohere is the leading security-first enterprise AI company...  ...focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We develop...  ...Where We Work: Cohere is remote-friendly. We have offices in... 
    Remote work
    Work at office

    Cohere

    United States
    3 days ago
  • Member of Technical Staff, Model Efficiency Who are we? Our mission...  ...team of researchers, engineers, designers, and more,...  ...on building reliable ML systems and pushing the boundaries of LLM inference efficiency. We...  ..., Seoul, and London. Remote‑friendly environment,... 
    Remote work
    Full time
    Work at office
    Flexible hours

    Cohere

    San Francisco, CA
    1 day ago
  • $170k - $216k

     ...Machine Learning Engineer, Model Optimization Waymo is an...  ...) develop methods for efficiently and continuously learning...  ...training and model inference through model architecture...  ...~ Experience with ML frameworks like PyTorch...  ...role can be performed remote, the specific salary... 
    Remote work
    Full time

    Waymo

    Mountain View, CA
    4 days ago
  • $128.7k - $261.3k

     ...repeatable, high-velocity model deployments through...  ...deployment and infra engineers to ship numerically robust...  ..., Data Science / ML, or a closely related...  .../ model compression / efficient inference or relevant experience...  ...This role is based remotely, but if the selected candidate... 
    Remote work
    Local area
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    3 days ago
  •  ...in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will...  ...with model architecture and inference optimization. Join a diverse...  ...innovation within a collaborative and remote-friendly work culture,... 
    Remote work

    Cohere

    San Francisco, CA
    1 day ago
  • $50k - $60k

    Apex Systems is hiring a Principal Machine Learning Engineer for Model Efficiency & Optimization in Austin, Texas. This senior individual contributor role involves overseeing model optimization strategy and ensuring high-performing, production-ready models for document... 

    Apex Systems

    Austin, TX
    12 hours ago
  •  ...deploying frontier models for developers and...  ...of researchers, engineers, designers, and more...  ...can do — but inference is still the bottleneck. The Model Efficiency team is responsible...  ...London. We embrace a remote-friendly environment...  ...locations. As a Staff Research Engineer,... 
    Remote work
    Full time
    Work at office
    Flexible hours

    Cohere

    San Francisco, CA
    4 days ago
  • $242k - $290k

     ...multi-modality foundation model to drive the next...  ...Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-...  .... You will optimize the ML models, write custom CUDA...  ...build highly concurrent inference code to ensure real-time... 
    Remote work
    Temporary work
    Relocation package

    Zoox

    Nacogdoches, TX
    3 days ago
  • $155.42k - $205.9k

     ...About the Team: The ML Inference Platform is part of...  ...agnostic, reliable, and cost-efficient platform that powers...  ...) machine learning models for experimental, online...  ...ML Infrastructure engineer to help build and scale...  ...relocation benefits. Remote/Hybrid: This role is... 
    Remote work
    Local area
    Work from home
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Austin, TX
    1 day ago
  • A healthcare technology firm in San Francisco is seeking an ML Infrastructure Engineer, Model Inference to build and optimize AI-driven solutions. You will design scalable Kubernetes clusters, enhance ML model serving infrastructure, and collaborate with cross-functional... 

    Abridge

    San Francisco, CA
    4 days ago
  •  ...automotive company seeks a Senior ML Infrastructure Engineer in Austin, Texas, to...  ...backend software for ML inference workflows. The engineer will...  ...ML engineers to ensure efficient model serving and lead technical...  ...compensation and benefits, with a remote work option. #J-18808-... 
    Remote work

    General Motors

    Austin, TX
    4 days ago
  • $180k - $275k

     ...builds foundation models of human behavior...  ...from the ground up. Engineers here own major...  ...Databricks Fully remote or hybrid from several...  ...focused on Inference and Serving at Yobi...  ...This is an applied ML systems role-equal...  ...caching, batching, and efficient feature retrieval.... 
    Remote work

    YOBI, LLC

    United States
    12 hours ago
  • $150k - $300k

     ...systems as part of a hybrid team. This role focuses on developing efficient architecture for serving LLMs and optimizing performance using...  ...infrastructure tools. Ideal candidates will have significant experience with ML systems, ensuring robust performance and scalability. The... 
    Remote job

    Prime-Intellect

    San Francisco, CA
    4 days ago
  • $180k - $210k

     ...Overview: The Principal AI/ML Engineer will support the development...  ...learning, and large language models. We offer generous...  ...variety of applications within remote sensing such as tasking collections...  ...engineering techniques / Inference time techniques (e.g. chain of... 
    Remote work
    Temporary work
    Work at office
    Local area
    Visa sponsorship
    Relocation package
    Flexible hours

    ARKA Group

    Aurora, CO
    2 days ago
  •  ...technical Product Manager to own AI inference and model serving for k0rdent AI, our...  ...systems, and performance engineering. You will define how...  ...senior technical role owning AI/ML and inference product(s) ~...  ...future job opportunities. #remote We are a Leader for Container... 
    Remote work

    Mirantis

    Austin, TX
    3 days ago
  •  ...Overview: The Principal AI/ML Engineer will support the development...  ...learning, and large language models. We offer generous...  ...variety of applications within remote sensing such as tasking collections...  ...engineering techniques / Inference time techniques (e.g. chain of... 
    Remote work
    Temporary work
    Work at office
    Local area
    Visa sponsorship
    Relocation package
    Flexible hours

    ARKA Group

    King of Prussia, PA
    2 days ago
  • $180k - $210k

     ...Position Overview The Principal AI/ML Engineer will support the development...  ...learning, and large language models. We offer generous...  ...variety of applications within remote sensing such as tasking collections...  ...engineering techniques / Inference time techniques (e.g. chain of... 
    Remote work
    Full time
    Temporary work
    Work at office
    Local area
    Visa sponsorship
    Relocation package
    Flexible hours

    TSG

    Aurora, CO
    12 hours ago
  •  ...seeking a Senior Machine Learning Engineer to spearhead core machine learning models and manage data pipelines. The ideal...  ...strong technical skills in ML methods, including deep learning,...  ...concepts for various stakeholders. A remote work option is available. #J-18808... 
    Remote job

    thatgamecompany

    Los Angeles, CA
    12 hours ago
  •  ...our Machine Learning and Inference Platform that powers...  ...hardware, software, and models. We're looking for a strong...  ...deep experience in ML serving, high-performance...  ...excited to mentor engineers, innovate at scale, and...  ...Fridays are flexible for remote work except for employees... 
    Remote work
    Work at office
    Local area
    Monday to Thursday
    Flexible hours

    Roku

    Austin, TX
    3 days ago
  • $175k - $280k

     ...New York is seeking an expert in optimizing machine learning models to turbocharge their serving layer, integrating LLM, speech, and...  ...significant experience in systems programming and performance engineering, aiming to improve high-throughput, low-latency serving. Join... 

    Sesame

    New York, NY
    4 days ago
  • $128.7k - $261.3k

     ...Model Deployment & Inference Solutions Team The Model Deployment & Inference Solutions...  ...is two-fold: build the ML deployment platform that...  ...currently performed manually by engineers. Build the developer...  ...AV-1 This role is based remotely, but if the selected candidate... 
    Remote work
    Local area
    Work from home
    Flexible hours
    Shift work

    General Motors

    United States
    3 days ago
  • # Principal Machine Learning Engineer - Model Efficiency & OptimizationApply**Job#: 3036752****Job Description:**Principal Machine Learning Engineer - Model Efficiency & Optimization**Location:** Austin, Texas (Onsite)Role OverviewWe are seeking a Principal Machine Learning... 
    Full time

    Apex Systems

    Austin, TX
    12 hours ago
  •  ...leading technology company is seeking a skilled ML Engineer responsible for developing and maintaining data pipelines for model training and evaluation. Candidates should...  ...competitive compensation, the opportunity to work remotely from anywhere in the world, and access to... 
    Remote job

    Eqvilent

    New York, NY
    4 days ago
  • Israelvcforum is looking for a Senior ML Infrastructure Engineer in Mountain View,...  ...robust platforms for ML inference workflows supporting GM’s AI...  ...and researchers to implement model serving strategies and...  ...skills. The role offers a remote work setup with required visits... 
    Remote job

    Israelvcforum

    Mountain View, CA
    4 days ago
  • $50 per hour

     ...branch of Sony AI, is a remotely distributed...  ...Multimodal Foundation Model for Vision...  ...intern is to develop efficient and effective methodologies...  ...-class scientists and engineers to tackle the most challenging...  ...on model compression, inference speedup, deployement on... 
    Remote work
    Hourly pay
    Internship
    Local area
    Worldwide
    Flexible hours

    Sony

    United States
    12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff ML Inference Engineer — Model Efficiency (Remote). Be the first to apply!