Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Engineer - Evaluation

$120k - $235k

HackerRank

HackerRank helps companies like NVIDIA, Amazon, and Microsoft hire and upskill the next generation of developers based on skills, not pedigree. Our platform is trusted by over 2,500 of the world's most innovative companies to build strong engineering teams ready for what's next.

Software has entered an era where humans and AI build side by side. As this shift accelerates, the definition of strong technical talent is changing. We give companies better ways to identify and invest in next-generation skills.

People at HackerRank care deeply about the impact of their work and sweat the small details so our customers can be wildly successful with products they genuinely love to use. We move with urgency and believe great outcomes come from high standards

About the role How developers were evaluated previously was whether they can write functionally correct code. How developers are being evaluated now is whether they can orchestrate AI to accomplish the task while still having the fundamentals underneath. That shift, between what used to matter and what matters now, is exactly the problem this role is hired to solve.
Open Problem How do you measure skill when AI is already in the room?

Software engineering has moved from writing code to using AI to solve problems. That shift sounds simple. The implications for assessment are not. This is not just a take-home assignment problem. It spans live interviews, async assessments, AI-assisted coding environments, pair programming with agents, and every other context in which someone is trying to figure out how good a developer actually is. The tools developers use are changing fast. The frameworks we use to evaluate them have not kept up.

For over a decade, skills-based hiring relied on deterministic evaluation: a candidate's code either passed test cases or it did not. The score was binary and reproducible. What replaces it is genuinely unsolved. Nobody has cracked how to fairly assess human skill in a world where AI assistance is ambient and invisible, where the question is no longer "can you write this function" but "how effectively do you use AI to solve a real problem."

We are moving from a deterministic evaluation to evaluation by a council of LLMs. But making that consistent, scalable, and defensible across hundreds of thousands of assessments is a hard research and engineering problem. How do you ensure the same rubric is applied the same way to the 200,000th candidate as to the first? How do you detect when your evaluation model is drifting? How do you explain a score to a candidate who believes they were assessed unfairly?

HackerRank sits at the center of this problem with a rare combination of scale, longitudinal data, and direct relationships with the companies making hiring decisions. The opportunity here is to define what rigorous, fair, and meaningful skill evaluation looks like in the agentic era. That methodology does not exist yet. This role exists to build it.
What you will do
  • Build LLM-powered evaluation pipelines that assess AI usage skills consistently, fairly, and at production scale
  • Own the evaluation methodology end to end: what the rubric is, how the model applies it, how you measure whether it is being applied correctly, and how you audit for bias
  • Design and run experiments to determine what good evaluation actually looks like. The answer is not known. You will be finding it
  • Build RAG pipelines and fine-tuning workflows that make evaluation models adhere reliably to the rules we set for them
  • Define the benchmarking infrastructure: how we know when our evaluation quality has improved, and how we catch regressions before candidates do
  • Translate model behavior into outcomes that product managers, enterprise customers, and candidates can understand and trust
Who you are
  • You have shipped LLM-powered systems in production where consistency and reliability were hard constraints, not nice-to-haves
  • You think as rigorously about how you measure your model as about the model itself. A poorly constructed eval is a worse outcome than a weaker model
  • You have a research mindset. You are comfortable operating in a space where the right methodology does not exist yet and needs to be invented
  • You think in systems. The data pipeline, the model, the serving layer, and the rubric it enforces are one problem to you
  • You can defend ML judgment in plain language to people who are not ML engineers, because the translation layer is part of the job
Even better if you have
  • Experience building evaluation frameworks for generative or conversational AI systems
  • Background in educational assessment, psychometrics, or human-in-the-loop evaluation at scale
  • Publications or open-source contributions in LLM evaluation, benchmarking, or alignment
  • Prior work at the interface of research and product, where you had to ship science, not just publish it
You will thrive here if
  • You find the measurement problem as interesting as the model problem, maybe more interesting.
  • You hold evaluation methodology to the same standard as model performance, and you are uncomfortable shipping something you cannot explain.
  • You want your work to define what good looks like in a field that is just now figuring that out.
Compensation

The annual US on-target earnings (OTE) range for this role is $120,000 - $235,000 which includes base salary and target bonus. This range may span multiple career levels at HackerRank and will be refined during the interview process based on factors such as the candidate's experience, qualifications, and location. Compensation for this role includes base salary, target bonus, and equity.

Want to learn more about HackerRank? Check out HackerRank.com to explore our products, solutions and resources, and dive into our story and mission here.

HackerRank is a proud equal employment opportunity and affirmative action employer. We provide equal opportunity to everyone for employment based on individual performance and qualification. We never discriminate based on race, religion, national origin, gender identity or expression, sexual orientation, age, marital, veteran, or disability status. All your information will be kept confidential according to EEO guidelines.


Linkedin | X | Blog | Instagram | View email address on click.appcast.io

Notice to prospective HackerRank job applicants:
  • Our Recruiters use @hackerrank.com email addresses.
  • We never ask for payment or credit check information to apply, interview, or work here.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Machine Learning Engineer - Evaluation in Santa Clara, CA vacancy
  •  ...Senior ML Engineer Medical Imaging Evaluation & AI Reliability About the Role: My client is building evaluation and evidence infrastructure...  ...Required Qualifications: Strong experience in machine learning for medical imaging (radiology, pathology, cardiology... 
    Suggested
    Shift work

    Established Search

    Sunnyvale, CA
    3 days ago
  •  ...674-0836 Summary We are seeking a highly experienced Machine Learning Engineer to build, deploy, and optimize Large Language Model (LLM)-...  ...delivery of Apple Foundation models and Apple Intelligence evaluations. We are looking for a Machine Learning Engineer focusing... 
    Suggested

    Apple

    Cupertino, CA
    16 hours ago
  • $184k - $287.5k

     ...Reality, Artificial Intelligence, Deep Learning and Autonomous Vehicles. NVIDIA's AV...  ...the next generation of driving behavior evaluation - moving beyond hand-crafted rules to...  ...experience in Computer Science, Computer Engineering, or a related technical field. Hands... 
    Suggested
    Remote work

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $204k - $259k

     ...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver... 
    Suggested
    Full time
    Temporary work
    Remote work

    Waymo

    Mountain View, CA
    3 days ago
  • $170k - $216k

     ...Machine Learning Engineer, Driver Understanding and Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the... 
    Suggested
    Full time

    Waymo

    Mountain View, CA
    2 days ago
  • $204k - $259k

     ...states. The Driver Understanding and Evaluation (DUE) team at Waymo is developing rich...  ...of the Waymo Driver.  The DUE Machine Learning team will build and operate scalable machine...  ...looking for researchers and software engineers who are passionate about developing... 
    Full time

    Waymo

    Mountain View, CA
    1 day ago
  • $238k - $302k

     ...Staff Machine Learning Engineer, Driver Understanding and Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building... 
    Full time

    Waymo

    Mountain View, CA
    1 day ago
  • $60 - $70 per hour

     ...Overview: We are seeking a Machine Learning Engineer to join a high-impact team focused on advancing LLM evaluation, NLP, and AI-driven automation. This role centers on designing scalable evaluation frameworks, optimizing prompt strategies, and building systems that... 
    Contract work
    Temporary work
    Remote work
    3 days per week

    TEKsystems

    Cupertino, CA
    2 days ago
  • $181.1k - $318.4k

     ...On-Device Machine Learning Engineer We're starting to see the incredible potential of multimodal foundation and large language models, and...  ...integrate foundation models for on-device deployment. Evaluate power and performance of models running on Apple devices.... 
    Relocation

    Apple

    Sunnyvale, CA
    4 days ago
  • $120k - $235k

     ...most innovative companies to build strong engineering teams ready for what’s next. Software...  ...across all integrity signals. Build the evaluation infrastructure, golden datasets, and benchmarking...  ..., target bonus, and equity. Want to learn more about HackerRank? Check out... 
    Shift work

    HackerRank

    Santa Clara, CA
    2 days ago
  • $147.4k - $272.1k

     ...Machine Learning Engineer: Multimodal Sensor Fusion At Apple, individual creativity converges around shared values that drive innovation....  ...environments. Your work will span the full spectrum from evaluating breakthrough research to solving complex real-world problems... 
    Relocation

    Apple

    Sunnyvale, CA
    16 hours ago
  • $150k

     ...Machine Learning Engineer About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using,...  ...such as data preprocessing, pre-training, post-training, evaluation and so on, especially foundation models. Participate in... 
    Worldwide
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    2 days ago
  • $120k - $235k

     ...most innovative companies to build strong engineering teams ready for what’s next. Software...  ...as intelligent as the candidates it is evaluating. Open Problem An interview that thinks...  ..., target bonus, and equity. Want to learn more about HackerRank? Check out... 
    Shift work

    HackerRank

    Santa Clara, CA
    3 days ago
  • $150k

     ...Distributed ML Engineer We are a dedicated research lab for building, understanding,...  ...for high-performance computing in deep learning, driving impactful discoveries that inspire...  ...benchmarks and testing methodologies to evaluate application performance Build tools to... 
    Work experience placement
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    2 days ago
  •  ...Machine Learning Engineer LeanData helps the world's fastest-growing companies automate, simplify, and accelerate revenue. We are looking...  ...-market strategies and their impact on success Build, evaluate, and optimize machine learning models to ensure high performance... 
    Full time
    Work at office
    Flexible hours

    LeanData

    Santa Clara, CA
    1 day ago
  •  ...The Role We\'re seeking a motivated ML Engineer to help advance our AutoML platform. You...  ..., including pipelines that combine deep-learning and conventional algorithms for embedded...  ...computer vision and non-CV ML models — train, evaluate, modify, and combine them to unlock new... 
    Remote work

    Nerdleveltech

    Sunnyvale, CA
    3 days ago
  • $181.1k - $272.1k

     ...technology for artificial intelligence, machine learning and natural language processing. The...  ...are looking for. Our universal search engine powers search features across Apple products...  ...scientists to develop, fine-tune, and evaluate domain-specific Large Language Models... 
    Relocation package

    Apple Inc.

    Santa Clara, CA
    3 days ago
  • $130k - $200k

     ...to join its fast-growing teams. We are seeking a Senior Machine Learning Engineer with expertise in deep learning and data analysis. In this...  ...automated modeling pipeline for periodic model training, evaluation, and updates. Collaborate with cross-disciplinary teams... 

    PlusAI, Inc.

    Santa Clara, CA
    2 days ago
  •  ...Number: 200657984-0836 Summary The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a...  ...genuinely useful AI outputs Experience partnering with engineering or data teams to define data collection requirements and... 

    Apple

    Cupertino, CA
    2 days ago
  •  ...Weekly Hours: 40 Role Number: 200657970-0836 Summary The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications; including Creator Studio, used by hundreds of millions... 
    Shift work

    Apple

    Cupertino, CA
    2 days ago
  • $212k - $318.4k

     ...AIML - Machine Learning Engineer, Visual Intelligence Technology Work Locations (2) Submit Resume Apple is where individual imaginations...  ...researchers and data scientists to develop, fine-tune, and evaluate domain specific Large Language Models for various tasks... 
    Relocation

    Apple

    Santa Clara, CA
    2 days ago
  • $147.4k - $272.1k

     ...Machine Learning Engineer, Search & Knowledge Quality Work Locations Submit Resume The Search and Knowledge Quality team is redefining...  ...collaboration abilities. Experimental rigor when training/evaluating LLMs for the purpose of benchmarking LLM optimization... 
    Relocation

    Apple

    Santa Clara, CA
    1 day ago
  • $212k - $318.4k

     ...Sr. Machine Learning Engineer Are you interested in enhancing the capabilities of Siri and Apple products to benefit our users? The Siri and...  ...delivery of generative experiences. Develop world-class evaluation techniques to ensure the accurate measurement and calibration... 
    Work experience placement
    Relocation

    Apple

    Santa Clara, CA
    2 days ago
  • $147.4k - $272.1k

     ...Machine Learning Engineer - Agentic AI The VCV organization has pioneered human-centric, real-time technologies such as Face ID, FaceKit,...  ...outputs, and failure handling. Develop infrastructure for evaluating and improving agentic system performance, including... 
    Relocation

    Apple

    Sunnyvale, CA
    16 hours ago
  • $19 - $65 per hour

     ...vehicle state. Responsibilities: Evaluate existing tools for fidelity and...  ...required by the company's existing simulation engine. Develop metrics to validate the...  ...programming skills Familiarity with deep learning frameworks (PyTorch preferred) Basic... 
    Internship

    PlusAI, Inc.

    Santa Clara, CA
    1 day ago
  • $224k - $356.5k

     ...We are looking for outstanding Machine Learning Engineers to join our Physical AI teams! As the pioneers of the GPU—the visual cortex of modern...  ...Detailed Validation: Establish a strong mentality for KPI evaluation and validation to ensure the quality and physical accuracy... 

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $181.1k - $318.4k

     ...Sr. Machine Learning Engineer - Answers, Knowledge & Information (AKI) Work Locations (2) Submit Resume Siri helps hundreds of millions...  ...prototyping to data collection, feature engineering, training, evaluation, and deployment in production. Lead the development of... 
    Local area
    Relocation

    Apple

    Santa Clara, CA
    3 days ago
  •  ...experience by leveraging cutting-edge technologies in GenAI, Machine Learning, Deep Learning, and Engineering. We tackle complex problems spanning natural...  ...model lifecycle — from experimentation and offline evaluation through serving, monitoring, and iterative improvement... 

    Walmart

    Sunnyvale, CA
    10 days ago
  • $147.4k - $272.1k

     ...Machine Learning Engineer - Special Projects We're seeking research engineers to build infrastructure for breakthrough innovations in AI agents...  ...pipelines, training infrastructure, model serving, or evaluation frameworks. ~ Solid software engineering skills in complex... 
    Relocation

    Apple

    Santa Clara, CA
    16 hours ago
  •  ...patients worldwide. We're a team of engineers, clinicians, and innovators united by one...  ...a biopsy at a target location. As a machine learning robotics engineer on the Ion project, you...  ...working to conceptualize, design, and evaluate new solutions for medical image insights... 
    Local area
    Worldwide
    Flexible hours

    Intuitive

    Sunnyvale, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer - Evaluation. Be the first to apply!