Research Engineer Intern, Evaluations
TensorStax
Research Engineer Intern, Evaluations & Benchmarks
Location: San Francisco (Hybrid) About TensorStax: TensorStax is building fully autonomous AI systems to manage and optimize mission-critical data infrastructure. Our research integrates reinforcement learning and language models to enhance reasoning over large-scale data lakes and warehouses, detect failures in pipelines, and autonomously construct and optimize data workflows with high precision. We are looking for a Research Engineer Intern to design evaluation frameworks and benchmarks that assess the autonomy, adaptability, and reliability of AI agents in data engineering environments. This role is ideal for candidates passionate about AI evaluations, language model benchmarking, and autonomous data systems.
What You'll Do:
- Develop evaluation environments to test AI agents' ability to reason, plan, and act autonomously within mission-critical data pipelines.
- Design benchmarks to assess model capabilities in failure detection, pipeline optimization, and agentic decision-making in data workflows.
- Implement automated assessment frameworks for language model-based agents operating over data lakes and warehouses.
- Work with synthetic and real-world datasets to create robust testing environments for AI-driven data automation.
- Collaborate with research engineers to refine reward shaping strategies, guiding models toward more efficient and agentic behaviors in data-intensive tasks.
What We're Looking For:
- Experience in language model research, with a focus on benchmarking LLMs in mission-critical domains.
- Strong background in AI evaluation methodologies, reinforcement learning, and RLHF techniques.
- Familiarity with benchmarking language models for structured and unstructured data tasks.
- Proficiency in Python and experience with ML frameworks like PyTorch or JAX.
- Hands-on experience with data lakes, warehouses, and data engineering tools (Snowflake, BigQuery, dbt, Spark, Kafka).
- High agency—proactive, resourceful, and comfortable working in a fast-paced research environment with minimal supervision.
- Attention to detail—ability to design rigorous, reproducible experiments and evaluations.
Bonus Points:
- Contributions to open-source AI benchmarks (e.g., SweBench, BIRD, SPIDER).
- Contributions to open-source agentic frameworks.
- Experience developing custom RL environments for AI evaluation.
- Strong understanding of ETL, ELT, and data transformation pipelines.
Benefits:
- Competitive internship stipend.
- 100% employer-covered health, dental, and vision insurance (for eligible interns).
- Access to Bay Club or Equinox in San Francisco.
- Opportunity to work at the cutting edge of AI evaluations and autonomous data engineering research.
- .... is searching for a dedicated professional to help build the evaluation harness necessary for our advanced AGI models. You will audit... ...transform insights into actionable strategies and elevate our research standards, leading to impactful AI developments in a collaborative...Suggested
- ...San Francisco is looking for a skilled professional to build evaluation harnesses that ensure models and agents are performing at their... ...existing evaluation processes, and develop tooling to assist research and product teams. The position emphasizes collaboration and delivery...SuggestedRelocation package
- ...Francisco, is seeking a dedicated professional for a full-time role to evaluate agent models and develop practical assessment rubrics. This... ...to aid decision-making. This role is pivotal to ensure product quality and enhance the research strategy. #J-18808-Ljbffr AGI IncSuggestedFull timeRelocation package
- Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...SuggestedFull time
$320k
Anthropic in New York City is seeking a Research Engineer to develop evaluations for Claude’s capabilities. The ideal candidate should have strong Python programming skills, experience with distributed systems, and the ability to communicate technical results effectively...SuggestedRemote job$315k
We are looking for Research Engineers to build “gold standard” evaluations for catastrophic risks, in order to understand what AI Safety Level (ASL) to assign to models. Research leads on this team collaborate with engineers in one of our focus areas: CBRN, Cyber, Autonomy...Currently hiringWork at officeImmediate startHome officeVisa sponsorshipRelocation package- Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5...
$9.7k - $19k
Introduction The Center for AI Safety (CAIS) is a leading research and field-building organization on a mission to reduce societal-... ...mix of technical, societal and policy solutions. As a research engineer intern here, you will work very closely with our researchers on...InternshipFull timeLocal area- Xterra AI in San Francisco is hiring an AI Research Engineer to develop and build infrastructure for cutting-edge AI systems that tackle complex... ...and domain experts to design agent infrastructures, evaluation frameworks, and data systems that ensure our products run efficiently...
- A leading research organization in AI is seeking a full-time fall intern to assist with projects in AI security and alignment. The position offers a stipend of $9,700 - $19,000 annually to help with living expenses. Interns will work closely with researchers, plan and...InternshipFull time
$70 - $100 per hour
Join Mercor as a STEM Computational Scientific Software & Evaluation Design Engineer, working remotely from anywhere in the United States. You will design computational problems and collaborate on AI strategies. The ideal candidate holds a graduate-level degree in a STEM...Remote jobHourly payContract work- ...Research Engineer, Foundation Models About the Opportunity We are seeking a Research Engineer to help advance the next generation... ...research and engineering, focusing on the development, training, evaluation, and deployment of state-of-the-art machine learning models....Visa sponsorshipRelocation packageFlexible hours
$140k - $200k
...Research Engineer The Center for AI Safety (CAIS) is a leading research and advocacy organization... ...build the tooling needed to train and evaluate models at scale, and turn results into... ...infrastructure as needed, such as internal tooling, documentation, and reproducibility...Work at officeLocal area$190k - $270k
...Research Engineer, Frontier Speculative Decoding San Francisco, New York City About the... ...applications, focusing on making translating our internal model training research to production-... ...tuning, and rigorous checkpoint evaluation before models ever hit production....Full time- ...Research Engineer - Scalable Interpretability Transluce is a non-profit research lab building tools for scalable, end-to-end oversight... ...analysis tools and use these to set industry standards for evaluation. Our tools are integrated with core agent benchmarks like SWE...Work at office
- ...Applied Research Engineer As an applied research engineer, you'll own customer engagements end-to-end: understanding their data, their... ...research, product, and infrastructure to define architectures, evaluation frameworks, and best practices What we're looking for:...Visa sponsorshipRelocation package
- ...Adam Founding Research Team Opportunity We're building the founding research team at Adam. At Adam, we're tackling a frontier... ...own the full AI stack, from data collection to model training, evaluation, and inference Have a track record of original contributions...
- ...Chai Discovery Chai is a research lab working on AI to unlock biology... ...models are changing how engineers write code. Our vision is a design... ...to help design, train, evaluate, and optimize Chai's core models... ...of operating system internals. Experience with HPC infrastructure...
- ...Research Engineer At Variance, we are teaching machines to make the hardest judgment calls at scale. That means building AI agents for... ...quality. You'll build the benchmarks, datasets, tooling, and evaluation loops that tell us whether our systems are actually getting...
- ...Applied Research Engineer The next wave of competitive advantage isn't better general models. It's models that understand your business... ...lifecycle from problem framing and data design to training, evaluation, and iteration in the wild. What You'll Do Train and...
- ...Memory Features Engineer Own the end-to-end lifecycle of memory features—from research to production. You'll fine-tune models for... ...and cost. You'll also build evaluation at scale (offline metrics + online... ...with embeddings, vector-DB internals, deduplication, and...Work at officeRemote work
$180k - $340k
...Research Engineer You'll own the quality of AI across everything Gamma creates. As our Research Engineer, you'll design evaluation frameworks that measure AI output quality, systematically improve... ...position, which spans multiple internal levels depending on...Full timeWork at officeWork from home- ...Genmo Software Engineer We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking... ...Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability...Work at office
$158k - $269k
...Research Engineer In Calibration Waabi, founded by AI visionary Raquel Urtasun, is the leader in Physical AI. With a world-class team,... ...systems, from posing and modeling hidden sensor parameters, to evaluating calibration quality. You should understand how to leverage...Full timeWork at officeWork from homeFlexible hours- ...Research Engineer On Physical Ai Team Hedra is a pioneering generative modeling company — first models to market — now building a Physical... ...multimodal learning Design and generate training and evaluation datasets from simulation, including environment setup, domain...Work at office
- ...leader who operates across projects. You will evaluate upcoming work, identify where standard... ...learned across projects Understand our internal and external customer's business and needs and work with our internal engineering and business teams to refine and roadmap...For contractorsWork at officeLocal area
$295k
Research Engineer - Speech & Realtime Models B2B Applications - San Francisco About the Team OpenAI is at the forefront of artificial intelligence, driving innovation and shaping the future with cutting‑edge research. Our mission is to ensure that AI's benefits reach everyone...Internship- Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches...Flexible hours
- ...Description We are seeking a creative, hands-on Senior Mechanical Research Engineer with significant vibration and dynamics experience to lead... ...mechanical sensing performance requirements. Select and evaluate new sensors for current and new mechanical sensing...
- ...Customer Support startup with their search for senior/staff ML research engineers. The role will be onsite in their SF office. What you'll... ..., performing supervised fine-tuning and RL, and working on evaluations and deployments. ~ Familiar with SFT, FL, DPO, PPO, GRPO...Work at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Research Engineer Intern, Evaluations. Be the first to apply!
- research software engineer San Francisco, CA
- research assistant engineering San Francisco, CA
- deep learning research engineer San Francisco, CA
- senior research engineer San Francisco, CA
- research programmer San Francisco, CA
- ai research engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- research engineer San Francisco, CA
- microsoft research San Francisco, CA
- oncology research nurse San Francisco, CA




