Research Engineer Intern, Evaluations

TensorStax

Research Engineer Intern, Evaluations & Benchmarks

Location: San Francisco (Hybrid) About TensorStax: TensorStax is building fully autonomous AI systems to manage and optimize mission-critical data infrastructure. Our research integrates reinforcement learning and language models to enhance reasoning over large-scale data lakes and warehouses, detect failures in pipelines, and autonomously construct and optimize data workflows with high precision. We are looking for a Research Engineer Intern to design evaluation frameworks and benchmarks that assess the autonomy, adaptability, and reliability of AI agents in data engineering environments. This role is ideal for candidates passionate about AI evaluations, language model benchmarking, and autonomous data systems.

What You'll Do:

Develop evaluation environments to test AI agents' ability to reason, plan, and act autonomously within mission-critical data pipelines.
Design benchmarks to assess model capabilities in failure detection, pipeline optimization, and agentic decision-making in data workflows.
Implement automated assessment frameworks for language model-based agents operating over data lakes and warehouses.
Work with synthetic and real-world datasets to create robust testing environments for AI-driven data automation.
Collaborate with research engineers to refine reward shaping strategies, guiding models toward more efficient and agentic behaviors in data-intensive tasks.

What We're Looking For:

Experience in language model research, with a focus on benchmarking LLMs in mission-critical domains.
Strong background in AI evaluation methodologies, reinforcement learning, and RLHF techniques.
Familiarity with benchmarking language models for structured and unstructured data tasks.
Proficiency in Python and experience with ML frameworks like PyTorch or JAX.
Hands-on experience with data lakes, warehouses, and data engineering tools (Snowflake, BigQuery, dbt, Spark, Kafka).
High agency—proactive, resourceful, and comfortable working in a fast-paced research environment with minimal supervision.
Attention to detail—ability to design rigorous, reproducible experiments and evaluations.

Bonus Points:

Contributions to open-source AI benchmarks (e.g., SweBench, BIRD, SPIDER).
Contributions to open-source agentic frameworks.
Experience developing custom RL environments for AI evaluation.
Strong understanding of ETL, ELT, and data transformation pipelines.

Benefits:

Competitive internship stipend.
100% employer-covered health, dental, and vision insurance (for eligible interns).
Access to Bay Club or Equinox in San Francisco.
Opportunity to work at the cutting edge of AI evaluations and autonomous data engineering research.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Research Engineer Intern, Evaluations in San Francisco, CA vacancy

Research Engineer, AGI Evaluation & On-Device Metrics
.... is searching for a dedicated professional to help build the evaluation harness necessary for our advanced AGI models. You will audit... ...transform insights into actionable strategies and elevate our research standards, leading to impactful AI developments in a collaborative...
Suggested
AI Chopping Block, Inc.
San Francisco, CA
2 days ago
Research Engineer — AGI Evaluation & On-Device Perf
...San Francisco is looking for a skilled professional to build evaluation harnesses that ensure models and agents are performing at their... ...existing evaluation processes, and develop tooling to assist research and product teams. The position emphasizes collaboration and delivery...
Suggested
Relocation package
AGI, Inc.
San Francisco, CA
3 days ago
Research Engineer — AI Evaluation & Release Readiness
...Francisco, is seeking a dedicated professional for a full-time role to evaluate agent models and develop practical assessment rubrics. This... ...to aid decision-making. This role is pivotal to ensure product quality and enhance the research strategy. #J-18808-Ljbffr AGI Inc
Suggested
Full time
Relocation package
AGI Inc
San Francisco, CA
4 days ago
Benchmarking Research Engineer: Frontier Model Evaluations
Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...
Suggested
Full time
Refresh AI
San Francisco, CA
3 days ago
Research Engineer, Model Evaluations - Remote-Friendly Impact
$320k
Anthropic in New York City is seeking a Research Engineer to develop evaluations for Claude’s capabilities. The ideal candidate should have strong Python programming skills, experience with distributed systems, and the ability to communicate technical results effectively...
Suggested
Remote job
Menlo Ventures
San Francisco, CA
5 days ago
Autonomy Safety Evaluations Research Engineer
$315k
We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads on this team collaborate with engineers in one of our focus areas: CBRN, Cyber, Autonomy...
Currently hiring
Work at office
Immediate start
Home office
Visa sponsorship
Relocation package
Anthropic
San Francisco, CA
3 days ago
Senior AI Research Engineer - RAG & GenAI Evaluation
Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5...
jobr.pro
San Francisco, CA
3 days ago
Research Engineer Intern (Fall 2026)
$9.7k - $19k
Introduction The Center for AI Safety (CAIS) is a leading research and field-building organization on a mission to reduce societal-... ...mix of technical, societal and policy solutions. As a research engineer intern here, you will work very closely with our researchers on...
Internship
Full time
Local area
Center for AI Safety
San Francisco, CA
6 days ago
AI Research Engineer — Agent Infra & Evaluation
Xterra AI in San Francisco is hiring an AI Research Engineer to develop and build infrastructure for cutting-edge AI systems that tackle complex... ...and domain experts to design agent infrastructures, evaluation frameworks, and data systems that ensure our products run efficiently...
Xterra AI
San Francisco, CA
3 days ago
Research Engineer Intern: AI Safety & ML Experimentation
A leading research organization in AI is seeking a full-time fall intern to assist with projects in AI security and alignment. The position offers a stipend of $9,700 - $19,000 annually to help with living expenses. Interns will work closely with researchers, plan and...
Internship
Full time
Center for AI Safety
San Francisco, CA
6 days ago
Remote Computational Mechanical Engineer for AI Evaluation
$70 - $100 per hour
Join Mercor as a STEM Computational Scientific Software & Evaluation Design Engineer, working remotely from anywhere in the United States. You will design computational problems and collaborate on AI strategies. The ideal candidate holds a graduate-level degree in a STEM...
Remote job
Hourly pay
Contract work
Mercor
San Francisco, CA
3 days ago
Research Engineer
...Research Engineer, Foundation Models About the Opportunity We are seeking a Research Engineer to help advance the next generation... ...research and engineering, focusing on the development, training, evaluation, and deployment of state-of-the-art machine learning models....
Visa sponsorship
Relocation package
Flexible hours
Acceler8 Talent
San Francisco, CA
3 days ago
Research Engineer
$140k - $200k
...Research Engineer The Center for AI Safety (CAIS) is a leading research and advocacy organization... ...build the tooling needed to train and evaluate models at scale, and turn results into... ...infrastructure as needed, such as internal tooling, documentation, and reproducibility...
Work at office
Local area
Center for AI Safety
San Francisco, CA
5 days ago
Research Engineer, Frontier Speculative Decoding
$190k - $270k
...Research Engineer, Frontier Speculative Decoding San Francisco, New York City About the... ...applications, focusing on making translating our internal model training research to production-... ...tuning, and rigorous checkpoint evaluation before models ever hit production....
Full time
Together AI
San Francisco, CA
3 days ago
Research Engineer - Scalable Interpretability
...Research Engineer - Scalable Interpretability Transluce is a non-profit research lab building tools for scalable, end-to-end oversight... ...analysis tools and use these to set industry standards for evaluation. Our tools are integrated with core agent benchmarks like SWE...
Work at office
Transluce
San Francisco, CA
4 days ago
Applied Research Engineer
...Applied Research Engineer As an applied research engineer, you'll own customer engagements end-to-end: understanding their data, their... ...research, product, and infrastructure to define architectures, evaluation frameworks, and best practices What we're looking for:...
Visa sponsorship
Relocation package
Applied Compute
San Francisco, CA
12 hours ago
Founding Research Engineer
...Adam Founding Research Team Opportunity We're building the founding research team at Adam. At Adam, we're tackling a frontier... ...own the full AI stack, from data collection to model training, evaluation, and inference Have a track record of original contributions...
adam.ai
San Francisco, CA
2 days ago
Research Engineer
...Chai Discovery Chai is a research lab working on AI to unlock biology... ...models are changing how engineers write code. Our vision is a design... ...to help design, train, evaluate, and optimize Chai's core models... ...of operating system internals. Experience with HPC infrastructure...
Chai Discovery
San Francisco, CA
2 days ago
Research Engineer, Evals
...Research Engineer At Variance, we are teaching machines to make the hardest judgment calls at scale. That means building AI agents for... ...quality. You'll build the benchmarks, datasets, tooling, and evaluation loops that tell us whether our systems are actually getting...
Variance
San Francisco, CA
2 days ago
Applied Research Engineer
...Applied Research Engineer The next wave of competitive advantage isn't better general models. It's models that understand your business... ...lifecycle from problem framing and data design to training, evaluation, and iteration in the wild. What You'll Do Train and...
Eragon
San Francisco, CA
4 days ago
Senior Research Engineer
...Memory Features Engineer Own the end-to-end lifecycle of memory features—from research to production. You'll fine-tune models for... ...and cost. You'll also build evaluation at scale (offline metrics + online... ...with embeddings, vector-DB internals, deduplication, and...
Work at office
Remote work
Mem0
San Francisco, CA
3 days ago
Research Engineer
$180k - $340k
...Research Engineer You'll own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output quality, systematically improve... ...position, which spans multiple internal levels depending on...
Full time
Work at office
Work from home
Gamma
San Francisco, CA
12 hours ago
Research Engineer (New Grad)
...Genmo Software Engineer We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking... ...Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability...
Work at office
Genmo
San Francisco, CA
5 days ago
Research Engineer, Calibration
$158k - $269k
...Research Engineer In Calibration Waabi, founded by AI visionary Raquel Urtasun, is the leader in Physical AI. With a world-class team,... ...systems, from posing and modeling hidden sensor parameters, to evaluating calibration quality. You should understand how to leverage...
Full time
Work at office
Work from home
Flexible hours
G2 Venture Partners
San Francisco, CA
4 days ago
Research Engineer
...Research Engineer On Physical Ai Team Hedra is a pioneering generative modeling company — first models to market — now building a Physical... ...multimodal learning Design and generate training and evaluation datasets from simulation, including environment setup, domain...
Work at office
HEDRA INC
San Francisco, CA
2 days ago
Construction Innovation Engineer
...leader who operates across projects. You will evaluate upcoming work, identify where standard... ...learned across projects Understand our internal and external customer's business and needs and work with our internal engineering and business teams to refine and roadmap...
For contractors
Work at office
Local area
Zipline
San Francisco, CA
20 days ago
Research Engineer - Speech & Realtime Models
$295k
Research Engineer - Speech & Realtime Models B2B Applications - San Francisco About the Team OpenAI is at the forefront of artificial intelligence, driving innovation and shaping the future with cutting‑edge research. Our mission is to ensure that AI's benefits reach everyone...
Internship
OpenAI
San Francisco, CA
5 days ago
Senior AI Research Engineer - RAG, Agents & Evaluation
Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches...
Flexible hours
Cacheflow
San Francisco, CA
5 days ago
Senior Research Engineer, Structural Dynamics & Vibrations
...Description We are seeking a creative, hands-on Senior Mechanical Research Engineer with significant vibration and dynamics experience to lead... ...mechanical sensing performance requirements. Select and evaluate new sensors for current and new mechanical sensing...
Gridware
San Francisco, CA
10 days ago
Senior/Staff ML Research Engineer
...Customer Support startup with their search for senior/staff ML research engineers. The role will be onsite in their SF office. What you'll... ..., performing supervised fine-tuning and RL, and working on evaluations and deployments. ~ Familiar with SFT, FL, DPO, PPO, GRPO...
Work at office
DRH Search
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Engineer Intern, Evaluations. Be the first to apply!