Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Engineer: Evaluation

Bedrock Robotics Inc

Join the team bringing advanced autonomy to the built world At Bedrock, we’re moving AI out of the lab and into the real world. Our team is composed of industry veterans who helped launch Waymo, scaled Segment to a $3.2B acquisition, and grew Uber Freight to $5B in revenue. Today, we’re deploying autonomous systems on heavy construction machinery across the country, accelerating project schedules of billion-dollar infrastructure projects and improving safety on job sites. Backed by $350M in funding, we’re working quickly to close the gap between America’s surging demand for housing, data centers, manufacturing hubs, and the construction industry’s growing labor shortage. This is where algorithms meet steel‑toed boots. You’ll collaborate with construction veterans and world‑class engineers to solve physical‑world problems that simulations can’t touch. If you’re ready to apply cutting‑edge technology to solve meaningful problems alongside a talented team—we’d love to have you join us. Machine Learning Engineer: Evaluation Bedrock is bringing autonomy to the construction industry! We’re a group of veterans from the autonomous vehicle industry who are passionate about bringing the benefits of automation to areas in the construction industry currently underserved by the market. We’re looking for a highly motivated engineer with experience evaluating complex ML systems deployed in the real world. Your mission: translate the infinite nuance of the built world into actionable, AI‑native evaluations that accelerate Bedrock Operator adoption. The ideal candidate has hands‑on experience building evaluation systems and designing and executing statistical tests to gauge performance deltas between system iterations. More importantly, you’ve iterated on complex ML systems run in production environments, and you understand the complexities that come with it. What you’ll do: Design and maintain eval systems: Build pipelines for measuring system performance – across open loop and closed loop simulation, hardware‑in‑the‑loop systems, and field data from Bedrock Operator‑equipped machinery. Excite other teams to gain insights earlier in the development cycle through streamlined workflows. Develop metrics: Connect product goals and system behavior – by bridging real‑world specification to measurable indicators from logged data. Empower confident decision making from parameter tuning to program planning by slicing through the noise and delivering objective insights. Classify data sources for training and testing: Implement infrastructure and classifiers – to self‑annotate data and allow creation of datasets for a variety of training and evaluation use cases. Leverage models to source rich annotations for massive datasets to accelerate model iteration. Predict system performance: Model metrics and interpret results – from various sources ranging from raw sensor data to key leading indicators. Determine whether new construction sites pose hidden challenges and drive business decisions about deployment readiness. What we’re looking for: Engineers who are currently Senior or Staff level with 5+ years of professional software engineering, data science, or research experience 2+ years of professional experience analyzing modern ML or robotics system performance on real‑world problems Proficiency in Python and a data warehouse query language and comfort with development on infrastructure within parallelized cloud‑based frameworks Strong statistical analysis skills (classification, model fit bias determination, hypothesis testing, and uncertainty quantification) Experience working with large datasets Bonus points: We’re especially interested in engineers who have applied statistical backgrounds to ML research or real‑world robotics applications. Our roles are often flexible. If you don’t fit all the criteria, or are in another location (especially one where we have an office like SF or NY) please apply anyway! We’d love to consider you. #J-18808-Ljbffr

Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Machine Learning Engineer: Evaluation in San Francisco, CA vacancy
  • $204k - $259k

     ...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver... 
    Suggested
    Full time
    Temporary work
    Remote work

    Waymo

    San Francisco, CA
    4 days ago
  • $180k - $270k

     ...data security and privacy protection. To learn more about Plaud, please visit and...  ...leadership can rely on. Possess strong software engineering skills (especially in Python) and have...  ...distributed systems, data pipelines, or evaluation harnesses that can run at scale against... 
    Suggested
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    1 day ago
  • $240.45k - $300.3k

     ...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced AI systems-including LLMs, agentic models, and multimodal pipelines-into mission-critical government environments. We build evaluation frameworks... 
    Suggested
    Full time

    Scale AI

    San Francisco, CA
    7 days ago
  •  ...privacy for the sake of ML advancement. Responsibilities Own LLM evaluation processes and methods with a focus on generating benchmarks...  ...may necessitate an abrupt shift in focus. You must be able to learn, implement, and extend state-of-the-art research. Preferred: past... 
    Suggested
    Local area
    Shift work

    Capitolis

    San Francisco, CA
    4 days ago
  •  ...Arena Intelligence, Inc. in San Francisco is seeking a Senior Machine Learning Engineer to enhance AI model evaluation systems. You will work on data pipelines, inference APIs, and new evaluation methods. The ideal candidate possesses strong programming skills, experience... 
    Suggested

    Arena Intelligence, Inc.

    San Francisco, CA
    1 day ago
  •  ...A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will... 

    Reducto

    San Francisco, CA
    13 hours ago
  • Arcada Labs Incorporated is seeking an ML Research Engineer in San Francisco to lead evaluations of AI models based on human preferences. You will design experiments and analysis pipelines to enhance our understanding of AI capabilities and contribute to user-facing tools... 

    Arcada Labs Incorporated

    San Francisco, CA
    3 days ago
  • $208k - $300k

     ...A leading AI company is seeking a Machine Learning Engineer in the Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and ensure they perform reliably in mission-critical environments. Ideal candidates have a strong... 

    Scale AI

    San Francisco, CA
    12 hours ago
  • Airbnb, Inc. is hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure for CSxAI initiatives. This role requires a PhD in a relevant field, extensive experience in ML/AI systems, and strong leadership in technical... 
    Remote job

    airbnb, Inc.

    San Francisco, CA
    4 days ago
  • $200k - $265k

     ...join us in building the future! About the Role As a Senior Machine Learning Engineer on the AI Image Generation (Imagine) team, you’ll design,...  ...number of custom looks and appearance traits. What You’ll Do Evaluate new image generation and identity preservation papers and... 
    Work at office

    Cantina Labs

    San Francisco, CA
    11 days ago
  •  ...construction veterans and world‑class engineers to solve physical‑world problems that simulations...  ...team—we’d love to have you join us. Machine Learning Engineer: Perception Bedrock is...  ...about why models fail. You know how to evaluate corner cases, manage or build data pipelines... 

    Bedrock Robotics Inc

    San Francisco, CA
    12 hours ago
  • $210k - $300k

     ...processing of multi‑modal sensor data Design evaluation frameworks to measure data quality and...  ...For 4+ years of experience in ML engineering or applied research Strong background in...  ...with robotics, embodied AI, or imitation learning Publications in top ML/CV venues... 
    Home office

    Gerra Group

    San Francisco, CA
    12 hours ago
  •  ...trading decisions are made. We’re hiring our Founding ML Engineer, the first full-time machine learning hire who will turn research and data into production...  ..., and model training. Implement backtesting and evaluation frameworks with clear performance metrics. Deliver regular... 
    Full time
    Immediate start
    Relocation
    Visa sponsorship
    Relocation package

    Poesis LLC

    San Francisco, CA
    1 day ago
  •  ...Intelligence is the open platform for evaluating how AI models perform in the real world...  .... We’re a team of researchers, engineers, academics, and builders from places like...  ...Arena Intelligence is seeking a Senior Machine Learning Engineer to help scale and strengthen... 
    Permanent employment
    Work at office

    Arena Intelligence, Inc.

    San Francisco, CA
    12 hours ago
  •  ...Francisco, NYC, or London offices. About the Role As a Machine Learning Engineer on the Marketplace team, you will build the models and decision...  ...routing, and liquidity optimization • Develop evaluation and experimentation frameworks that connect model performance... 
    Work at office
    Relocation package

    Mercor Inc

    San Francisco, CA
    1 day ago
  • $164.7k - $266k

     ...contract lifecycle management (CLM). What you'll do As a Machine Learning Engineer on the AI Platform team, you will design and build the foundational...  ...platform-level tools for automated prompt engineering, evaluation, and optimization to accelerate the AI development... 
    Contract work
    Work at office
    Local area
    Remote work
    2 days per week

    Unavailable

    San Francisco, CA
    12 hours ago
  •  ...world‑class investors including a16z, Benchmark, and First Round Capital. The Opportunity As an ML Eval Engineer, you’ll play a key role in building the evaluation systems and benchmarks that make Reducto’s models better over time. You’ll collaborate closely with our ML... 
    Work at office
    Local area

    Reducto

    San Francisco, CA
    1 day ago
  •  ...We are a team of engineers and researchers with an ambitious mission: to move the world...  ...number of other open source developers, machine learning researchers, and entrepreneurs. If you...  ...files, and generate concise summaries Evaluate retrieval and RAG for pull requests: retrieve... 
    Relocation package

    Assert

    San Francisco, CA
    1 day ago
  •  ...growth and first‑time buyer success. We’re looking for a Machine Learning Engineer to accelerate buyer growth at Whatnot through intelligent,...  ...data collection and feature engineering to model training, evaluation, deployment, and online experimentation. Identify and prioritize... 
    Local area
    Work from home
    Home office
    Flexible hours

    Whatnot

    San Francisco, CA
    1 day ago
  •  ...fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100...  ...office at our San Francisco office Eager to learn and adapt quickly Prior startup or...  ...table, and vision-language models Build evaluation, data curation, and active learning pipelines... 
    Work at office
    Visa sponsorship
    Relocation package

    Trypulse

    San Francisco, CA
    5 days ago
  • $123.7k - $254.67k

     ...advertisers can trust to grow their business. We are seeking a Machine Learning Engineer to build out our simulation and AI capabilities. You’ll...  ...Develop counterfactual and what‑if frameworks for evaluating bidding strategies, budget allocation, and pacing algorithms... 
    Work at office
    Local area
    Relocation
    Relocation package

    I did my part and supported the Regular Toilet

    San Francisco, CA
    12 hours ago
  •  ...'re assisting a well-funded startup with their search for Machine Learning Engineers. Their product helps AI teams turn complex documents into...  ...techniques to improve LLM accuracy Build data pipelines, evaluate model performance, and integrate models into the product... 
    Work at office

    DRH Search

    San Francisco, CA
    12 hours ago
  •  ...About the Role We're looking for founding Machine Learning Engineers (MLEs) to own and improve our core action models end-to-end - the intelligence...  ...-level optimizations between client and server Build evaluation frameworks and data pipelines to measure and improve model... 
    Sleeping nights

    Composite.ai

    San Francisco, CA
    12 hours ago
  •  ...Title: Machine Learning Engineer Job Type: Contract Contract Length: 6 months Target Start Date: ASAP Work Location/Structure...  ...Experience ~3+ years of end-to-end experience in training, evaluating, and deploying machine learning models in a production... 
    Contract work
    Immediate start
    Remote work

    DeWinter Group

    San Francisco, CA
    11 days ago
  • $163k - $245k

     ...*Comscore, Total Visits, March 2025) Day to Day As a Machine Learning Engineer III, you will be a team lead. You will own one of the team...  ...based on job-related skills, experience, and expertise, as evaluated during the interview process. The range(s) listed is just... 
    Work experience placement
    Local area

    Indeed

    San Francisco, CA
    3 days ago
  •  ...conferences like ICLR, ICML, AAAI. Role Description As a Machine Learning Engineer at Advex, you will play a pivotal role in shaping the...  ...distributions # Controllability of Diffusion Models # Evaluation of generative models Engineering # Full‑stack development... 

    OpenReq

    San Francisco, CA
    4 days ago
  • $200k

     ...Founding ML Engineer San Francisco, on-site, full-time - $200,000...  ...large corporation--you'll often learn a new role every few months,...  ...modified peptide library into machine-readable representations (...  ...classical approaches Design rigorous evaluation frameworks for small datasets... 
    Full time
    Night shift
    Day shift
    Afternoon shift

    Stealth Deep Tech

    San Francisco, CA
    1 day ago
  •  ...in San Francisco, CA. You’ll be: Evaluating and implementing LLM based knowledge graphs...  ...with the platform through features like learn from feedback, search personalization,...  ...the product and contribute to the AI/ML engineering strategy You’ll be successful if you…... 

    Onyx

    San Francisco, CA
    5 days ago
  • $160k - $180k

     ...Hybrid Department Department Technology Engineering Compensation $160K – $180K • Offers...  ...people forward. We are looking for a Machine Learning Engineer to join the growing AI and Machine...  ...an ML pipeline from model building, evaluation, optimizing performance, and ensuring... 
    Full time
    Work at office
    Worldwide
    Flexible hours
    3 days per week

    Alumni Ventures

    San Francisco, CA
    13 hours ago
  • $155.52k - $194.4k

     ...See yourself at Twilio Join the team as Twilio's next Machine Learning Engineer. About the job This position is neededto drive...  ...that bridges Product, Design, and Engineering to develop, evaluate, and maintain scalable, low-latency, ML-based systems for... 
    Local area
    Remote work
    Worldwide

    Twilio

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer: Evaluation. Be the first to apply!