Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

ML Evaluation Engineer — Real‑World AI Metrics

Arcada Labs Incorporated

Arcada Labs Incorporated is seeking an ML Research Engineer in San Francisco to lead evaluations of AI models based on human preferences. You will design experiments and analysis pipelines to enhance our understanding of AI capabilities and contribute to user-facing tools and leaderboards. Ideal candidates should have experience with modern AI systems and model evaluation methodologies, along with strong statistical judgment and a passion for advancing model capabilities. #J-18808-Ljbffr Arcada Labs Incorporated

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the ML Evaluation Engineer — Real‑World AI Metrics in San Francisco, CA vacancy
  •  ...Job Opportunity At Dynamo AI At Dynamo AI, we believe that LLMs must...  ...developed with safety, privacy, and real-world responsibility in mind. Our ML team comes from a culture of academic...  ...Responsibilities Own LLM evaluation processes and methods with a focus... 
    Suggested
    Local area
    Shift work

    Dynamo AI

    San Francisco, CA
    3 days ago
  •  ...autonomy to the built world At Bedrock, we're moving AI out of the lab and into the real world. Our team...  ...and world-class engineers to solve...  ...Learning Engineer: Evaluation Bedrock is...  ...evaluating complex ML systems deployed...  .... Develop metrics: ~... 
    Suggested
    Work at office
    Flexible hours

    Bedrock Robotics

    San Francisco, CA
    2 days ago
  • $240.45k - $300.3k

    Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced AI systems—including LLMs, agentic...  ..., and effectively under real-world constraints. As an ML...  ..., robustness, and safety metrics, including LLM-judge-... 
    Suggested
    Full time

    Scale AI

    San Francisco, CA
    2 days ago
  •  ...Systems, Inc in San Francisco is seeking a Senior Machine Learning Engineer to lead perception architecture for defense applications. The ideal candidate will have over 5 years of experience in real-world perception systems, strong skills in Python and C++, and a proven... 
    Suggested

    Aurelius Systems, Inc

    San Francisco, CA
    20 hours ago
  • $129k - $198.4k

     ...Job Description Role: As an AI/ML Engineer on the Metrics Frameworks team, part of the Simulation, Evaluation, and Data organization, you will be an individual contributor...  ...manager}. About GM Our vision is a world with Zero Crashes, Zero Emissions and Zero Congestion... 
    Suggested
    Local area
    Work from home

    General Motors

    San Francisco, CA
    2 days ago
  •  ...Senior ML/RL Engineer, Behavior Planning At Bot...  ...achieved numerous world-firsts and unparalleled...  ...and the real world by developing...  ...to ensure safety metrics are treated as primary...  ...reward functions and evaluation metrics that...  ...passionate about AI, safety, and transforming... 
    Shift work

    Bot Auto

    San Francisco, CA
    1 day ago
  •  ...Machine Learning Engineer Location:...  ...OpenAI for Physics. AI startup based in...  ...is hiring an ML Engineer to help...  ...model training and evaluation. Run training...  ...clearly (metrics, dashboards, short...  ...Passionate about solving real customer...  ...collaboration Access to world-class investors... 
    Work at office
    Flexible hours
    1 day per week

    UniversalAGI

    San Francisco, CA
    1 day ago
  •  ...Founding Ml Engineer Skills: Python, PyTorch, NLP, LLMs...  ...to the internet for AI agents. Our APIs already...  ...know how to build and evaluate retrieval systems,...  ...contrastive learning, metric learning, and representation...  ...systems over messy real-world data Background in... 

    Crustdata (YC F24)

    San Francisco, CA
    20 hours ago
  •  ...ML Ops Engineer — Agentic AI Lab (Founding Team) Location: San Francisco...  ...VC, we're building a world-class team to tackle...  ...reproducibility, and evaluation capture Create and...  ...usage, performance metrics, error tracing, and...  ...solve some really hard real world problems – we... 
    Full time

    Fabrion

    San Francisco, CA
    3 days ago
  •  ...ML Engineer We're looking for an ML Engineer with...  ...and owning production AI and ML systems used by real people. You're...  ...with clear quality metrics and business impact....  ...strong judgment around evaluation and know how to measure...  ...markets in the world: hiring. We partner... 

    Paraform

    San Francisco, CA
    3 days ago
  • $200k - $365k

     ...Plaud is building the world's most trusted AI work companion for...  ...defensible, and automated metrics that researchers and...  ...strong software engineering skills (especially in...  ..., data pipelines, or evaluation harnesses that can run...  ...deeply partner with ML researchers to define... 
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    1 day ago
  •  ...Founding Applied ML Engineer Title of Role: Founding...  ...representing an early-stage AI company that operates...  ...NLP to tackle real-world data challenges. Collaborate...  ...workflows. Evaluate and benchmark model performance...  ..., establish quality metrics, and drive continuous... 
    Work at office

    Recruiting from Scratch

    San Francisco, CA
    3 days ago
  • $204.5k - $267k

     ...Senior Data Engineer Formation Bio is a tech and AI driven pharma company differentiated by radically more efficient drug development. Advancements...  ...Intelligence (SDI) team at Formation Bio to help transform Real World Data (RWD)—spanning electronic health records, claims,... 
    Work at office
    Local area
    Relocation
    3 days per week

    Formation Bio (Formerly TrailSpark)

    San Francisco, CA
    3 days ago
  •  ...AI/ML Engineer Ello's mission is to maximize the potential...  ..., we're building the world's first AI teacher:...  ...problems while delivering real-world products. We use...  ..., designing the evaluations that prove they work,...  ...questions, to concrete metrics and research plans, back... 
    Work at office
    Worldwide
    Shift work

    Ello

    San Francisco, CA
    1 day ago
  • $170k - $216k

     ...Description: ai/ml phthon engineer The Perception team builds...  ...that "perceives" the world around the car. We work jointly...  ...own research to address real-world problems and...  ...Develop and maintain model evaluation recipes and metrics for measuring and improving... 
    Full time
    Remote work

    ESR Healthcare

    San Francisco, CA
    1 day ago
  •  ...machine learning and AI backbone behind...  ...both the data and ML foundations and...  ..., model training, evaluation, and inference....  ...multivariate) to measure real-world outcomes such as...  ...with platform engineers and product designers...  ...combining offline metrics (AUC, NDCG) and online... 

    pear.ai

    San Francisco, CA
    3 days ago
  • $131.4k - $235.95k

     ...creative people in the world. As a Senior...  ...Learning Engineer focused on Machine...  ...you will ensure AI-powered...  ...with researchers, evaluation engineers, and product...  ...performance in real customer...  ...and performance metrics for deployed services...  ...running production ML or LLM inference... 
    For contractors
    Remote work

    Autodesk

    San Francisco, CA
    4 days ago
  •  ...enterprise infrastructure and AI, with leadership roots at...  ...simplify AI integration into real-world systems, with the observability...  ...looking for a visionary Senior ML Engineer who will bridge the gap...  ...clustering models; utilizing evaluation frameworks to quantify performance... 
    Shift work

    Palm Venture Studios

    San Francisco, CA
    3 days ago
  •  ...ML Engineer San Francisco, California, United States Or refer someone Job...  ...sales and service teams work. Their AI technology captures and analyzes real-world conversations, providing full...  ...pipelines for model training and evaluation. Familiarity with FastAPI, OpenAI... 
    Full time

    Catalyst Labs, LLC

    San Francisco, CA
    1 day ago
  • $172.2k - $258.4k

     ...Senior Machine Learning Engineer to join our Ads...  ...member of the Vector AI group, you will play a...  ...creatives Conduct offline evaluations and online A/B...  ...learning models to complex real-world problems Strong software...  ...understanding of metric design, experimentation... 
    Work at office
    Worldwide
    Relocation package

    UNITY

    San Francisco, CA
    3 days ago
  • $150k - $300k

     ...Founding ML Engineer Location: San Francisco...  ...layer that enables AI agents to access,...  ...understand, and act on real-time internet data...  ...to real-world data problems Leverage...  ...Continuously evaluate and improve model...  ...experimentation and metrics Work closely with... 
    Visa sponsorship

    Recruiting from Scratch

    San Francisco, CA
    3 days ago
  • $205k - $316.4k

     ...Machine Learning Engineer At Quizlet,...  ...design and deliver AI-powered learning...  ...across the world and unlock human...  ...systems that drive real-time product decisions...  ...scalable ML systems that drive...  ...for training, evaluation, deployment, and...  ...connecting offline metrics to online impact... 
    Work at office
    3 days per week

    Quizlet

    San Francisco, CA
    2 days ago
  • $200k - $300k

     ...Glean is the Work AI platform that...  ...agents that automate real work across...  ...enterprise and world, structured and...  ...better over time: evaluation pipelines, quality...  ...and the tooling engineers use to...  ...engineering, applied ML, and direct product...  ...judges that score metrics like correctness... 
    Home office
    Flexible hours
    3 days per week

    Glean.info

    San Francisco, CA
    1 day ago
  •  ...builds general-purpose AI for the physical world. Training our models...  ...The Team The ML Infrastructure team supports...  ...- Strong software engineering fundamentals - Experience...  ..., checkpointing, and metrics/logging. - Scale...  ..., modalities, and evaluation metrics. What We Hope... 
    Flexible hours

    Physical Intelligence

    San Francisco, CA
    20 hours ago
  •  ...not yet rebuilt around AI-Poesis is leading that...  ...research with immediate real-world validation where your work...  ...We're hiring an ML Engineer who will turn research...  ...Implement backtesting and evaluation frameworks with clear performance metrics. Deliver regular, documented... 
    Full time
    Work at office
    Immediate start
    Visa sponsorship
    Work visa
    Relocation package
    3 days per week

    Poesis LLC

    San Francisco, CA
    2 days ago
  •  ...autonomy to the built world At Bedrock, we're moving AI out of the lab and into the real world. Our team is...  ...veterans and world-class engineers to solve physical-...  ...datasets Build metrics to evaluate model performance in...  ...teams to integrate ML models into real-world... 
    Work at office
    Flexible hours

    Bedrock Robotics

    San Francisco, CA
    2 days ago
  •  ...platform for leading AI teams who demand...  ...over $100M from world-class investors including...  ...As an ML Eval Engineer, you'll play a key...  ...role in building the evaluation systems and benchmarks...  ..., and create metrics and tooling that surface...  ...large and messy real-world datasets.... 
    Work at office
    Local area

    Reducto

    San Francisco, CA
    2 days ago
  •  ...Machine Learning Engineer Bucket Robotics is hiring...  ...You'll work on the core ML systems that turn 3D geometry...  ...Design, train, and evaluate computer vision and ML...  ...data, and limited real-world data Run rigorous experiments...  ...frameworks and metrics that reflect real-world... 
    Shift work

    Bucket Robotics

    San Francisco, CA
    3 days ago
  •  ...optimal sleep. As the world's first sleep...  ..., software, and AI technology to...  ...Machine Learning Engineer to build and...  ...prototyping → offline evaluation → online...  ...Build and deploy ML models that improve...  ...capabilities to real product workflows...  ...strategies (offline metrics, slice-based... 
    Full time
    Immediate start
    Worldwide
    Night shift

    Eight Sleep

    San Francisco, CA
    2 days ago
  • $204k - $259k

     ...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving...  ...with the mission to be the world's most trusted driver. Since...  ...The mission of the Waymo AI Foundations team is to develop...  ...experience Experience in ML engineering and applied Deep... 
    Full time
    Temporary work
    Remote work

    Waymo

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to ML Evaluation Engineer — Real‑World AI Metrics. Be the first to apply!