Machine Learning Engineer: Evaluation
Bedrock Robotics Inc
Join the team bringing advanced autonomy to the built world At Bedrock, we’re moving AI out of the lab and into the real world. Our team is composed of industry veterans who helped launch Waymo, scaled Segment to a $3.2B acquisition, and grew Uber Freight to $5B in revenue. Today, we’re deploying autonomous systems on heavy construction machinery across the country, accelerating project schedules of billion-dollar infrastructure projects and improving safety on job sites. Backed by $350M in funding, we’re working quickly to close the gap between America’s surging demand for housing, data centers, manufacturing hubs, and the construction industry’s growing labor shortage. This is where algorithms meet steel‑toed boots. You’ll collaborate with construction veterans and world‑class engineers to solve physical‑world problems that simulations can’t touch. If you’re ready to apply cutting‑edge technology to solve meaningful problems alongside a talented team—we’d love to have you join us. Machine Learning Engineer: Evaluation Bedrock is bringing autonomy to the construction industry! We’re a group of veterans from the autonomous vehicle industry who are passionate about bringing the benefits of automation to areas in the construction industry currently underserved by the market. We’re looking for a highly motivated engineer with experience evaluating complex ML systems deployed in the real world. Your mission: translate the infinite nuance of the built world into actionable, AI‑native evaluations that accelerate Bedrock Operator adoption. The ideal candidate has hands‑on experience building evaluation systems and designing and executing statistical tests to gauge performance deltas between system iterations. More importantly, you’ve iterated on complex ML systems run in production environments, and you understand the complexities that come with it. What you’ll do: Design and maintain eval systems: Build pipelines for measuring system performance – across open loop and closed loop simulation, hardware‑in‑the‑loop systems, and field data from Bedrock Operator‑equipped machinery. Excite other teams to gain insights earlier in the development cycle through streamlined workflows. Develop metrics: Connect product goals and system behavior – by bridging real‑world specification to measurable indicators from logged data. Empower confident decision making from parameter tuning to program planning by slicing through the noise and delivering objective insights. Classify data sources for training and testing: Implement infrastructure and classifiers – to self‑annotate data and allow creation of datasets for a variety of training and evaluation use cases. Leverage models to source rich annotations for massive datasets to accelerate model iteration. Predict system performance: Model metrics and interpret results – from various sources ranging from raw sensor data to key leading indicators. Determine whether new construction sites pose hidden challenges and drive business decisions about deployment readiness. What we’re looking for: Engineers who are currently Senior or Staff level with 5+ years of professional software engineering, data science, or research experience 2+ years of professional experience analyzing modern ML or robotics system performance on real‑world problems Proficiency in Python and a data warehouse query language and comfort with development on infrastructure within parallelized cloud‑based frameworks Strong statistical analysis skills (classification, model fit bias determination, hypothesis testing, and uncertainty quantification) Experience working with large datasets Bonus points: We’re especially interested in engineers who have applied statistical backgrounds to ML research or real‑world robotics applications. Our roles are often flexible. If you don’t fit all the criteria, or are in another location (especially one where we have an office like SF or NY) please apply anyway! We’d love to consider you. #J-18808-Ljbffr
$204k - $259k
...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver...SuggestedFull timeTemporary workRemote work$180k - $270k
...data security and privacy protection. To learn more about Plaud, please visit and... ...leadership can rely on. Possess strong software engineering skills (especially in Python) and have... ...distributed systems, data pipelines, or evaluation harnesses that can run at scale against...SuggestedFull timeWork at officeWorldwide$240.45k - $300.3k
...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced AI systems-including LLMs, agentic models, and multimodal pipelines-into mission-critical government environments. We build evaluation frameworks...SuggestedFull time- ...privacy for the sake of ML advancement. Responsibilities Own LLM evaluation processes and methods with a focus on generating benchmarks... ...may necessitate an abrupt shift in focus. You must be able to learn, implement, and extend state-of-the-art research. Preferred: past...SuggestedLocal areaShift work
- ...Arena Intelligence, Inc. in San Francisco is seeking a Senior Machine Learning Engineer to enhance AI model evaluation systems. You will work on data pipelines, inference APIs, and new evaluation methods. The ideal candidate possesses strong programming skills, experience...Suggested
- ...A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will...
- Arcada Labs Incorporated is seeking an ML Research Engineer in San Francisco to lead evaluations of AI models based on human preferences. You will design experiments and analysis pipelines to enhance our understanding of AI capabilities and contribute to user-facing tools...
$208k - $300k
...A leading AI company is seeking a Machine Learning Engineer in the Public Sector to develop automated evaluation pipelines for AI models. You will work on advanced AI systems and ensure they perform reliably in mission-critical environments. Ideal candidates have a strong...- Airbnb, Inc. is hiring a Senior Staff Machine Learning Engineer, focusing on driving evaluation strategies and data infrastructure for CSxAI initiatives. This role requires a PhD in a relevant field, extensive experience in ML/AI systems, and strong leadership in technical...Remote job
$200k - $265k
...join us in building the future! About the Role As a Senior Machine Learning Engineer on the AI Image Generation (Imagine) team, you’ll design,... ...number of custom looks and appearance traits. What You’ll Do Evaluate new image generation and identity preservation papers and...Work at office- ...construction veterans and world‑class engineers to solve physical‑world problems that simulations... ...team—we’d love to have you join us. Machine Learning Engineer: Perception Bedrock is... ...about why models fail. You know how to evaluate corner cases, manage or build data pipelines...
$210k - $300k
...processing of multi‑modal sensor data Design evaluation frameworks to measure data quality and... ...For 4+ years of experience in ML engineering or applied research Strong background in... ...with robotics, embodied AI, or imitation learning Publications in top ML/CV venues...Home office- ...trading decisions are made. We’re hiring our Founding ML Engineer, the first full-time machine learning hire who will turn research and data into production... ..., and model training. Implement backtesting and evaluation frameworks with clear performance metrics. Deliver regular...Full timeImmediate startRelocationVisa sponsorshipRelocation package
- ...Intelligence is the open platform for evaluating how AI models perform in the real world... .... We’re a team of researchers, engineers, academics, and builders from places like... ...Arena Intelligence is seeking a Senior Machine Learning Engineer to help scale and strengthen...Permanent employmentWork at office
- ...Francisco, NYC, or London offices. About the Role As a Machine Learning Engineer on the Marketplace team, you will build the models and decision... ...routing, and liquidity optimization • Develop evaluation and experimentation frameworks that connect model performance...Work at officeRelocation package
$164.7k - $266k
...contract lifecycle management (CLM). What you'll do As a Machine Learning Engineer on the AI Platform team, you will design and build the foundational... ...platform-level tools for automated prompt engineering, evaluation, and optimization to accelerate the AI development...Contract workWork at officeLocal areaRemote work2 days per week- ...world‑class investors including a16z, Benchmark, and First Round Capital. The Opportunity As an ML Eval Engineer, you’ll play a key role in building the evaluation systems and benchmarks that make Reducto’s models better over time. You’ll collaborate closely with our ML...Work at officeLocal area
- ...We are a team of engineers and researchers with an ambitious mission: to move the world... ...number of other open source developers, machine learning researchers, and entrepreneurs. If you... ...files, and generate concise summaries Evaluate retrieval and RAG for pull requests: retrieve...Relocation package
- ...growth and first‑time buyer success. We’re looking for a Machine Learning Engineer to accelerate buyer growth at Whatnot through intelligent,... ...data collection and feature engineering to model training, evaluation, deployment, and online experimentation. Identify and prioritize...Local areaWork from homeHome officeFlexible hours
- ...fail. We are a small, fast-growing team of engineers in San Francisco powering Fortune 100... ...office at our San Francisco office Eager to learn and adapt quickly Prior startup or... ...table, and vision-language models Build evaluation, data curation, and active learning pipelines...Work at officeVisa sponsorshipRelocation package
$123.7k - $254.67k
...advertisers can trust to grow their business. We are seeking a Machine Learning Engineer to build out our simulation and AI capabilities. You’ll... ...Develop counterfactual and what‑if frameworks for evaluating bidding strategies, budget allocation, and pacing algorithms...Work at officeLocal areaRelocationRelocation package- ...'re assisting a well-funded startup with their search for Machine Learning Engineers. Their product helps AI teams turn complex documents into... ...techniques to improve LLM accuracy Build data pipelines, evaluate model performance, and integrate models into the product...Work at office
- ...About the Role We're looking for founding Machine Learning Engineers (MLEs) to own and improve our core action models end-to-end - the intelligence... ...-level optimizations between client and server Build evaluation frameworks and data pipelines to measure and improve model...Sleeping nights
- ...Title: Machine Learning Engineer Job Type: Contract Contract Length: 6 months Target Start Date: ASAP Work Location/Structure... ...Experience ~3+ years of end-to-end experience in training, evaluating, and deploying machine learning models in a production...Contract workImmediate startRemote work
$163k - $245k
...*Comscore, Total Visits, March 2025) Day to Day As a Machine Learning Engineer III, you will be a team lead. You will own one of the team... ...based on job-related skills, experience, and expertise, as evaluated during the interview process. The range(s) listed is just...Work experience placementLocal area- ...conferences like ICLR, ICML, AAAI. Role Description As a Machine Learning Engineer at Advex, you will play a pivotal role in shaping the... ...distributions # Controllability of Diffusion Models # Evaluation of generative models Engineering # Full‑stack development...
$200k
...Founding ML Engineer San Francisco, on-site, full-time - $200,000... ...large corporation--you'll often learn a new role every few months,... ...modified peptide library into machine-readable representations (... ...classical approaches Design rigorous evaluation frameworks for small datasets...Full timeNight shiftDay shiftAfternoon shift- ...in San Francisco, CA. You’ll be: Evaluating and implementing LLM based knowledge graphs... ...with the platform through features like learn from feedback, search personalization,... ...the product and contribute to the AI/ML engineering strategy You’ll be successful if you…...
$160k - $180k
...Hybrid Department Department Technology Engineering Compensation $160K – $180K • Offers... ...people forward. We are looking for a Machine Learning Engineer to join the growing AI and Machine... ...an ML pipeline from model building, evaluation, optimizing performance, and ensuring...Full timeWork at officeWorldwideFlexible hours3 days per week$155.52k - $194.4k
...See yourself at Twilio Join the team as Twilio's next Machine Learning Engineer. About the job This position is neededto drive... ...that bridges Product, Design, and Engineering to develop, evaluate, and maintain scalable, low-latency, ML-based systems for...Local areaRemote workWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Engineer: Evaluation. Be the first to apply!
- machine learning software engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- graduate machine learning engineer San Francisco, CA
- computer vision machine learning engineer San Francisco, CA
- machine learning engineer San Francisco, CA
- senior ml engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- data scientist machine learning engineer San Francisco, CA
- machine learning intern San Francisco, CA

