Machine Learning Engineer: Evaluation

Bedrock Robotics Inc

Join the team bringing advanced autonomy to the built world At Bedrock, we’re moving AI out of the lab and into the real world. Our team is composed of industry veterans who helped launch Waymo, scaled Segment to a $3.2B acquisition, and grew Uber Freight to $5B in revenue. Today, we’re deploying autonomous systems on heavy construction machinery across the country, accelerating project schedules of billion-dollar infrastructure projects and improving safety on job sites. Backed by $350M in funding, we’re working quickly to close the gap between America’s surging demand for housing, data centers, manufacturing hubs, and the construction industry’s growing labor shortage. This is where algorithms meet steel‑toed boots. You’ll collaborate with construction veterans and world‑class engineers to solve physical‑world problems that simulations can’t touch. If you’re ready to apply cutting‑edge technology to solve meaningful problems alongside a talented team—we’d love to have you join us. Machine Learning Engineer: Evaluation Bedrock is bringing autonomy to the construction industry! We’re a group of veterans from the autonomous vehicle industry who are passionate about bringing the benefits of automation to areas in the construction industry currently underserved by the market. We’re looking for a highly motivated engineer with experience evaluating complex ML systems deployed in the real world. Your mission: translate the infinite nuance of the built world into actionable, AI‑native evaluations that accelerate Bedrock Operator adoption. The ideal candidate has hands‑on experience building evaluation systems and designing and executing statistical tests to gauge performance deltas between system iterations. More importantly, you’ve iterated on complex ML systems run in production environments, and you understand the complexities that come with it. What you’ll do: Design and maintain eval systems: Build pipelines for measuring system performance – across open loop and closed loop simulation, hardware‑in‑the‑loop systems, and field data from Bedrock Operator‑equipped machinery. Excite other teams to gain insights earlier in the development cycle through streamlined workflows. Develop metrics: Connect product goals and system behavior – by bridging real‑world specification to measurable indicators from logged data. Empower confident decision making from parameter tuning to program planning by slicing through the noise and delivering objective insights. Classify data sources for training and testing: Implement infrastructure and classifiers – to self‑annotate data and allow creation of datasets for a variety of training and evaluation use cases. Leverage models to source rich annotations for massive datasets to accelerate model iteration. Predict system performance: Model metrics and interpret results – from various sources ranging from raw sensor data to key leading indicators. Determine whether new construction sites pose hidden challenges and drive business decisions about deployment readiness. What we’re looking for: Engineers who are currently Senior or Staff level with 5+ years of professional software engineering, data science, or research experience 2+ years of professional experience analyzing modern ML or robotics system performance on real‑world problems Proficiency in Python and a data warehouse query language and comfort with development on infrastructure within parallelized cloud‑based frameworks Strong statistical analysis skills (classification, model fit bias determination, hypothesis testing, and uncertainty quantification) Experience working with large datasets Bonus points: We’re especially interested in engineers who have applied statistical backgrounds to ML research or real‑world robotics applications. Our roles are often flexible. If you don’t fit all the criteria, or are in another location (especially one where we have an office like SF or NY) please apply anyway! We’d love to consider you. #J-18808-Ljbffr

Apply

Vacancy posted 13 hours ago

Similar jobs that could be interesting for youBased on the Machine Learning Engineer: Evaluation in San Francisco, CA vacancy

Senior Machine Learning Engineer - VLM/LLM Evaluation
$204k - $259k
...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver...
Suggested
Full time
Temporary work
Remote work
Waymo
San Francisco, CA
4 days ago
Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...data security and privacy protection. To learn more about Plaud, please visit and... ...leadership can rely on. Possess strong software engineering skills (especially in Python) and have... ...distributed systems, data pipelines, or evaluation harnesses that can run at scale against...
Suggested
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
1 day ago
Senior Machine Learning Engineer - Model Evaluations, Public Sector
$240.45k - $300.3k
...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced AI systems-including LLMs, agentic models, and multimodal pipelines-into mission-critical government environments. We build evaluation frameworks...
Suggested
Full time
Scale AI
San Francisco, CA
7 days ago
ML Engineer — LLM Evaluation
...privacy for the sake of ML advancement. Responsibilities Own LLM evaluation processes and methods with a focus on generating benchmarks... ...may necessitate an abrupt shift in focus. You must be able to learn, implement, and extend state-of-the-art research. Preferred: past...
Suggested
Local area
Shift work
Capitolis
San Francisco, CA
4 days ago
Senior ML Engineer - Real-World AI Evaluations
...Arena Intelligence, Inc. in San Francisco is seeking a Senior Machine Learning Engineer to enhance AI model evaluation systems. You will work on data pipelines, inference APIs, and new evaluation methods. The ideal candidate possesses strong programming skills, experience...
Suggested
Arena Intelligence, Inc.
San Francisco, CA
1 day ago
ML Evaluation Engineer: Benchmark & Model Quality
...A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will...
Reducto
San Francisco, CA
13 hours ago
ML Evaluation Engineer — Real‑World AI Metrics
Arcada Labs Incorporated is seeking an ML Research Engineer in San Francisco to lead evaluations of AI models based on human preferences. You will design experiments and analysis pipelines to enhance our understanding of AI capabilities and contribute to user-facing tools...
Arcada Labs Incorporated
San Francisco, CA
3 days ago
ML Engineer, Public Sector: Model Evaluations & Safety
$208k - $300k
...A leading AI company is seeking a Machine Learning Engineer in the Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and ensure they perform reliably in mission-critical environments. Ideal candidates have a strong...
Scale AI
San Francisco, CA
12 hours ago
Senior Staff ML Engineer, Data & Evaluation (Remote)
Airbnb, Inc. is hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure for CSxAI initiatives. This role requires a PhD in a relevant field, extensive experience in ML/AI systems, and strong leadership in technical...
Remote job
airbnb, Inc.
San Francisco, CA
4 days ago
Machine Learning Engineer, Images
$200k - $265k
...join us in building the future! About the Role As a Senior Machine Learning Engineer on the AI Image Generation (Imagine) team, you’ll design,... ...number of custom looks and appearance traits. What You’ll Do Evaluate new image generation and identity preservation papers and...
Work at office
Cantina Labs
San Francisco, CA
11 days ago
Machine Learning Engineer: Perception
...construction veterans and world‑class engineers to solve physical‑world problems that simulations... ...team—we’d love to have you join us. Machine Learning Engineer: Perception Bedrock is... ...about why models fail. You know how to evaluate corner cases, manage or build data pipelines...
Bedrock Robotics Inc
San Francisco, CA
12 hours ago
MACHINE LEARNING ENGINEER
$210k - $300k
...processing of multi‑modal sensor data Design evaluation frameworks to measure data quality and... ...For 4+ years of experience in ML engineering or applied research Strong background in... ...with robotics, embodied AI, or imitation learning Publications in top ML/CV venues...
Home office
Gerra Group
San Francisco, CA
12 hours ago
Founding Machine Learning Engineer
...trading decisions are made. We’re hiring our Founding ML Engineer, the first full-time machine learning hire who will turn research and data into production... ..., and model training. Implement backtesting and evaluation frameworks with clear performance metrics. Deliver regular...
Full time
Immediate start
Relocation
Visa sponsorship
Relocation package
Poesis LLC
San Francisco, CA
1 day ago
Machine Learning Engineer
...Intelligence is the open platform for evaluating how AI models perform in the real world... .... We’re a team of researchers, engineers, academics, and builders from places like... ...Arena Intelligence is seeking a Senior Machine Learning Engineer to help scale and strengthen...
Permanent employment
Work at office
Arena Intelligence, Inc.
San Francisco, CA
12 hours ago
Machine Learning Engineer, Marketplace
...Francisco, NYC, or London offices. About the Role As a Machine Learning Engineer on the Marketplace team, you will build the models and decision... ...routing, and liquidity optimization • Develop evaluation and experimentation frameworks that connect model performance...
Work at office
Relocation package
Mercor Inc
San Francisco, CA
1 day ago
Machine Learning Engineer
$164.7k - $266k
...contract lifecycle management (CLM). What you'll do As a Machine Learning Engineer on the AI Platform team, you will design and build the foundational... ...platform-level tools for automated prompt engineering, evaluation, and optimization to accelerate the AI development...
Contract work
Work at office
Local area
Remote work
2 days per week
Unavailable
San Francisco, CA
12 hours ago
Machine Learning Eval Engineer
...world‑class investors including a16z, Benchmark, and First Round Capital. The Opportunity As an ML Eval Engineer, you’ll play a key role in building the evaluation systems and benchmarks that make Reducto’s models better over time. You’ll collaborate closely with our ML...
Work at office
Local area
Reducto
San Francisco, CA
1 day ago
Machine Learning Engineer San Francisco, CA
...We are a team of engineers and researchers with an ambitious mission: to move the world... ...number of other open source developers, machine learning researchers, and entrepreneurs. If you... ...files, and generate concise summaries Evaluate retrieval and RAG for pull requests: retrieve...
Relocation package
Assert
San Francisco, CA
1 day ago
Machine Learning Engineer (Growth)
...growth and first‑time buyer success. We’re looking for a Machine Learning Engineer to accelerate buyer growth at Whatnot through intelligent,... ...data collection and feature engineering to model training, evaluation, deployment, and online experimentation. Identify and prioritize...
Local area
Work from home
Home office
Flexible hours
Whatnot
San Francisco, CA
1 day ago
Machine Learning Engineer
...fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...office at our San Francisco office Eager to learn and adapt quickly Prior startup or... ...table, and vision-language models Build evaluation, data curation, and active learning pipelines...
Work at office
Visa sponsorship
Relocation package
Trypulse
San Francisco, CA
5 days ago
Machine Learning Engineer, tvScientific
$123.7k - $254.67k
...advertisers can trust to grow their business. We are seeking a Machine Learning Engineer to build out our simulation and AI capabilities. You’ll... ...Develop counterfactual and what‑if frameworks for evaluating bidding strategies, budget allocation, and pacing algorithms...
Work at office
Local area
Relocation
Relocation package
I did my part and supported the Regular Toilet
San Francisco, CA
12 hours ago
Machine Learning Engineer
...'re assisting a well-funded startup with their search for Machine Learning Engineers. Their product helps AI teams turn complex documents into... ...techniques to improve LLM accuracy Build data pipelines, evaluate model performance, and integrate models into the product...
Work at office
DRH Search
San Francisco, CA
12 hours ago
Founding Machine Learning Engineer
...About the Role We're looking for founding Machine Learning Engineers (MLEs) to own and improve our core action models end-to-end - the intelligence... ...-level optimizations between client and server Build evaluation frameworks and data pipelines to measure and improve model...
Sleeping nights
Composite.ai
San Francisco, CA
12 hours ago
Machine Learning Engineer
...Title: Machine Learning Engineer Job Type: Contract Contract Length: 6 months Target Start Date: ASAP Work Location/Structure... ...Experience ~3+ years of end-to-end experience in training, evaluating, and deploying machine learning models in a production...
Contract work
Immediate start
Remote work
DeWinter Group
San Francisco, CA
11 days ago
Machine Learning Engineer III
$163k - $245k
...*Comscore, Total Visits, March 2025) Day to Day As a Machine Learning Engineer III, you will be a team lead. You will own one of the team... ...based on job-related skills, experience, and expertise, as evaluated during the interview process. The range(s) listed is just...
Work experience placement
Local area
Indeed
San Francisco, CA
3 days ago
Machine Learning Engineer
...conferences like ICLR, ICML, AAAI. Role Description As a Machine Learning Engineer at Advex, you will play a pivotal role in shaping the... ...distributions # Controllability of Diffusion Models # Evaluation of generative models Engineering # Full‑stack development...
OpenReq
San Francisco, CA
4 days ago
Founding Machine Learning Engineer
$200k
...Founding ML Engineer San Francisco, on-site, full-time - $200,000... ...large corporation--you'll often learn a new role every few months,... ...modified peptide library into machine-readable representations (... ...classical approaches Design rigorous evaluation frameworks for small datasets...
Full time
Night shift
Day shift
Afternoon shift
Stealth Deep Tech
San Francisco, CA
1 day ago
Machine Learning Engineer
...in San Francisco, CA. You’ll be: Evaluating and implementing LLM based knowledge graphs... ...with the platform through features like learn from feedback, search personalization,... ...the product and contribute to the AI/ML engineering strategy You’ll be successful if you…...
Onyx
San Francisco, CA
5 days ago
Machine Learning Engineer
$160k - $180k
...Hybrid Department Department Technology Engineering Compensation $160K – $180K • Offers... ...people forward. We are looking for a Machine Learning Engineer to join the growing AI and Machine... ...an ML pipeline from model building, evaluation, optimizing performance, and ensuring...
Full time
Work at office
Worldwide
Flexible hours
3 days per week
Alumni Ventures
San Francisco, CA
13 hours ago
Machine Learning Engineer
$155.52k - $194.4k
...See yourself at Twilio Join the team as Twilio's next Machine Learning Engineer. About the job This position is neededto drive... ...that bridges Product, Design, and Engineering to develop, evaluate, and maintain scalable, low-latency, ML-based systems for...
Local area
Remote work
Worldwide
Twilio
San Francisco, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer: Evaluation. Be the first to apply!