Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Research Engineer Intern, Evaluations

TensorStax

Research Engineer Intern, Evaluations & Benchmarks

Location: San Francisco (Hybrid) About TensorStax: TensorStax is building fully autonomous AI systems to manage and optimize mission-critical data infrastructure. Our research integrates reinforcement learning and language models to enhance reasoning over large-scale data lakes and warehouses, detect failures in pipelines, and autonomously construct and optimize data workflows with high precision. We are looking for a Research Engineer Intern to design evaluation frameworks and benchmarks that assess the autonomy, adaptability, and reliability of AI agents in data engineering environments. This role is ideal for candidates passionate about AI evaluations, language model benchmarking, and autonomous data systems.

What You'll Do:

  • Develop evaluation environments to test AI agents' ability to reason, plan, and act autonomously within mission-critical data pipelines.
  • Design benchmarks to assess model capabilities in failure detection, pipeline optimization, and agentic decision-making in data workflows.
  • Implement automated assessment frameworks for language model-based agents operating over data lakes and warehouses.
  • Work with synthetic and real-world datasets to create robust testing environments for AI-driven data automation.
  • Collaborate with research engineers to refine reward shaping strategies, guiding models toward more efficient and agentic behaviors in data-intensive tasks.

What We're Looking For:

  • Experience in language model research, with a focus on benchmarking LLMs in mission-critical domains.
  • Strong background in AI evaluation methodologies, reinforcement learning, and RLHF techniques.
  • Familiarity with benchmarking language models for structured and unstructured data tasks.
  • Proficiency in Python and experience with ML frameworks like PyTorch or JAX.
  • Hands-on experience with data lakes, warehouses, and data engineering tools (Snowflake, BigQuery, dbt, Spark, Kafka).
  • High agency—proactive, resourceful, and comfortable working in a fast-paced research environment with minimal supervision.
  • Attention to detail—ability to design rigorous, reproducible experiments and evaluations.

Bonus Points:

  • Contributions to open-source AI benchmarks (e.g., SweBench, BIRD, SPIDER).
  • Contributions to open-source agentic frameworks.
  • Experience developing custom RL environments for AI evaluation.
  • Strong understanding of ETL, ELT, and data transformation pipelines.

Benefits:

  • Competitive internship stipend.
  • 100% employer-covered health, dental, and vision insurance (for eligible interns).
  • Access to Bay Club or Equinox in San Francisco.
  • Opportunity to work at the cutting edge of AI evaluations and autonomous data engineering research.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Research Engineer Intern, Evaluations in San Francisco, CA vacancy
  •  .... is searching for a dedicated professional to help build the evaluation harness necessary for our advanced AGI models. You will audit...  ...transform insights into actionable strategies and elevate our research standards, leading to impactful AI developments in a collaborative... 
    Suggested

    AI Chopping Block, Inc.

    San Francisco, CA
    2 days ago
  •  ...San Francisco is looking for a skilled professional to build evaluation harnesses that ensure models and agents are performing at their...  ...existing evaluation processes, and develop tooling to assist research and product teams. The position emphasizes collaboration and delivery... 
    Suggested
    Relocation package

    AGI, Inc.

    San Francisco, CA
    3 days ago
  •  ...Francisco, is seeking a dedicated professional for a full-time role to evaluate agent models and develop practical assessment rubrics. This...  ...to aid decision-making. This role is pivotal to ensure product quality and enhance the research strategy. #J-18808-Ljbffr AGI Inc
    Suggested
    Full time
    Relocation package

    AGI Inc

    San Francisco, CA
    4 days ago
  • Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning... 
    Suggested
    Full time

    Refresh AI

    San Francisco, CA
    3 days ago
  • $320k

    Anthropic in New York City is seeking a Research Engineer to develop evaluations for Claude’s capabilities. The ideal candidate should have strong Python programming skills, experience with distributed systems, and the ability to communicate technical results effectively... 
    Suggested
    Remote job

    Menlo Ventures

    San Francisco, CA
    5 days ago
  • $315k

    We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads on this team collaborate with engineers in one of our focus areas: CBRN, Cyber, Autonomy... 
    Currently hiring
    Work at office
    Immediate start
    Home office
    Visa sponsorship
    Relocation package

    Anthropic

    San Francisco, CA
    3 days ago
  • Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5... 

    jobr.pro

    San Francisco, CA
    3 days ago
  • $9.7k - $19k

    Introduction The Center for AI Safety (CAIS) is a leading research and field-building organization on a mission to reduce societal-...  ...mix of technical, societal and policy solutions. As a research engineer intern here, you will work very closely with our researchers on... 
    Internship
    Full time
    Local area

    Center for AI Safety

    San Francisco, CA
    6 days ago
  • Xterra AI in San Francisco is hiring an AI Research Engineer to develop and build infrastructure for cutting-edge AI systems that tackle complex...  ...and domain experts to design agent infrastructures, evaluation frameworks, and data systems that ensure our products run efficiently... 

    Xterra AI

    San Francisco, CA
    3 days ago
  • A leading research organization in AI is seeking a full-time fall intern to assist with projects in AI security and alignment. The position offers a stipend of $9,700 - $19,000 annually to help with living expenses. Interns will work closely with researchers, plan and... 
    Internship
    Full time

    Center for AI Safety

    San Francisco, CA
    6 days ago
  • $70 - $100 per hour

    Join Mercor as a STEM Computational Scientific Software & Evaluation Design Engineer, working remotely from anywhere in the United States. You will design computational problems and collaborate on AI strategies. The ideal candidate holds a graduate-level degree in a STEM... 
    Remote job
    Hourly pay
    Contract work

    Mercor

    San Francisco, CA
    3 days ago
  •  ...Research Engineer, Foundation Models About the Opportunity We are seeking a Research Engineer to help advance the next generation...  ...research and engineering, focusing on the development, training, evaluation, and deployment of state-of-the-art machine learning models.... 
    Visa sponsorship
    Relocation package
    Flexible hours

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  • $140k - $200k

     ...Research Engineer The Center for AI Safety (CAIS) is a leading research and advocacy organization...  ...build the tooling needed to train and evaluate models at scale, and turn results into...  ...infrastructure as needed, such as internal tooling, documentation, and reproducibility... 
    Work at office
    Local area

    Center for AI Safety

    San Francisco, CA
    5 days ago
  • $190k - $270k

     ...Research Engineer, Frontier Speculative Decoding San Francisco, New York City About the...  ...applications, focusing on making translating our internal model training research to production-...  ...tuning, and rigorous checkpoint evaluation before models ever hit production.... 
    Full time

    Together AI

    San Francisco, CA
    3 days ago
  •  ...Research Engineer - Scalable Interpretability Transluce is a non-profit research lab building tools for scalable, end-to-end oversight...  ...analysis tools and use these to set industry standards for evaluation. Our tools are integrated with core agent benchmarks like SWE... 
    Work at office

    Transluce

    San Francisco, CA
    4 days ago
  •  ...Applied Research Engineer As an applied research engineer, you'll own customer engagements end-to-end: understanding their data, their...  ...research, product, and infrastructure to define architectures, evaluation frameworks, and best practices What we're looking for:... 
    Visa sponsorship
    Relocation package

    Applied Compute

    San Francisco, CA
    12 hours ago
  •  ...Adam Founding Research Team Opportunity We're building the founding research team at Adam. At Adam, we're tackling a frontier...  ...own the full AI stack, from data collection to model training, evaluation, and inference Have a track record of original contributions... 

    adam.ai

    San Francisco, CA
    2 days ago
  •  ...Chai Discovery Chai is a research lab working on AI to unlock biology...  ...models are changing how engineers write code. Our vision is a design...  ...to help design, train, evaluate, and optimize Chai's core models...  ...of operating system internals. Experience with HPC infrastructure... 

    Chai Discovery

    San Francisco, CA
    2 days ago
  •  ...Research Engineer At Variance, we are teaching machines to make the hardest judgment calls at scale. That means building AI agents for...  ...quality. You'll build the benchmarks, datasets, tooling, and evaluation loops that tell us whether our systems are actually getting... 

    Variance

    San Francisco, CA
    2 days ago
  •  ...Applied Research Engineer The next wave of competitive advantage isn't better general models. It's models that understand your business...  ...lifecycle from problem framing and data design to training, evaluation, and iteration in the wild. What You'll Do Train and... 

    Eragon

    San Francisco, CA
    4 days ago
  •  ...Memory Features Engineer Own the end-to-end lifecycle of memory features—from research to production. You'll fine-tune models for...  ...and cost. You'll also build evaluation at scale (offline metrics + online...  ...with embeddings, vector-DB internals, deduplication, and... 
    Work at office
    Remote work

    Mem0

    San Francisco, CA
    3 days ago
  • $180k - $340k

     ...Research Engineer You'll own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output quality, systematically improve...  ...position, which spans multiple internal levels depending on... 
    Full time
    Work at office
    Work from home

    Gamma

    San Francisco, CA
    12 hours ago
  •  ...Genmo Software Engineer We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking...  ...Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability... 
    Work at office

    Genmo

    San Francisco, CA
    5 days ago
  • $158k - $269k

     ...Research Engineer In Calibration Waabi, founded by AI visionary Raquel Urtasun, is the leader in Physical AI. With a world-class team,...  ...systems, from posing and modeling hidden sensor parameters, to evaluating calibration quality. You should understand how to leverage... 
    Full time
    Work at office
    Work from home
    Flexible hours

    G2 Venture Partners

    San Francisco, CA
    4 days ago
  •  ...Research Engineer On Physical Ai Team Hedra is a pioneering generative modeling company — first models to market — now building a Physical...  ...multimodal learning Design and generate training and evaluation datasets from simulation, including environment setup, domain... 
    Work at office

    HEDRA INC

    San Francisco, CA
    2 days ago
  •  ...leader who operates across projects. You will evaluate upcoming work, identify where standard...  ...learned across projects Understand our internal and external customer's business and needs and work with our internal engineering and business teams to refine and roadmap... 
    For contractors
    Work at office
    Local area

    Zipline

    San Francisco, CA
    20 days ago
  • $295k

    Research Engineer - Speech & Realtime Models B2B Applications - San Francisco About the Team OpenAI is at the forefront of artificial intelligence, driving innovation and shaping the future with cutting‑edge research. Our mission is to ensure that AI's benefits reach everyone... 
    Internship

    OpenAI

    San Francisco, CA
    5 days ago
  • Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches... 
    Flexible hours

    Cacheflow

    San Francisco, CA
    5 days ago
  •  ...Description We are seeking a creative, hands-on Senior Mechanical Research Engineer with significant vibration and dynamics experience to lead...  ...mechanical sensing performance requirements. Select and evaluate new sensors for current and new mechanical sensing... 

    Gridware

    San Francisco, CA
    10 days ago
  •  ...Customer Support startup with their search for senior/staff ML research engineers. The role will be onsite in their SF office. What you'll...  ..., performing supervised fine-tuning and RL, and working on evaluations and deployments. ~ Familiar with SFT, FL, DPO, PPO, GRPO... 
    Work at office

    DRH Search

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Engineer Intern, Evaluations. Be the first to apply!