Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship

Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship

Here at Siemens, we take pride in enabling sustainable progress through technology. We do this through empowering customers by combining the real and digital worlds. Improving how we live, work, and move today and for the next generation! We know that the only way a business thrive is if our people are thriving. That’s why we always put our people first. Our global, diverse team would be happy to support you and challenge you to grow in new ways.

Siemens Research & Predevelopment (RPD) is the central R&D department of Siemens and thus has a key role to shape the future of our products. RPD acts as a strategic partner to support the executive units of Siemens. In consequence the main research focus is on future technologies for industry, infrastructure, mobility, and healthcare. In this context, we are looking for an Intern that supports our Software Systems and Processes team in Princeton, NJ by researching and developing scalable intelligent systems using LLMs and semantic technologies.

Transform the everyday with us!

Are you passionate about ensuring the reliability and robustness of cutting-edge AI systems? We're looking for an innovative PhD intern to join our team and contribute to groundbreaking research focused on implementing a Verification and Validation (V&V) framework for multi-agent systems.

Modern software is rapidly moving from static applications to agentic AI systems that plan, reason, call tools, coordinate across agents, and adapt over multiple steps. As these LLM-powered systems enter industrial workflows, the critical challenge is no longer only building capable agents—it is evaluating, verifying, and validating that they behave reliably, safely, and transparently in complex, uncertain environments. In this internship, you will research and prototype next-generation methods for LLM and multi-agent system evaluation, including benchmarks, guardrails, failure-mode analysis, runtime monitoring, formal methods, and testing technologies. You will help advance trustworthy AI for real-world industrial software systems where robustness, explainability, and dependable performance matter.

The internship provides a unique experience to contribute to innovative industrial applications while mentored by experienced professionals in an international setting.

This role is preferred to be on-site in Princeton, NJ, for a hands-on and collaborative experience, however remote candidates will be considered. The position is a full-time role for at least 3 months with the possibility of extension.

Key Responsibilities

Research, design, and prototype V&V methods for multi-agent and agentic AI systems, with emphasis on reliability, safety, repeatability, explainability, and robustness under uncertain operating conditions.
Develop evaluation harnesses, benchmarks, and test scenarios for LLM-based agents, including tool use, multi-step reasoning, orchestration, failure-mode analysis, and adversarial or edge-case behavior.
Implement proof-of-concept prototypes in Python using modern AI and agent frameworks, formal methods, testing technologies, and retrieval-augmented or knowledge-grounded architectures where appropriate.
Investigate verification strategies such as model checking, property-based testing, fuzz testing, static or dynamic analysis, runtime monitoring, guardrails, and trace-based observability for complex intelligent systems.
Collaborate with researchers and engineers to define milestones, run experiments, analyze results, and translate research insights into scalable industrial software concepts.
Document findings, contribute to scientific publications or technical reports, and present results clearly to internal and external technical audiences.

Basic Qualifications

Currently enrolled in a PhD program in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Formal Methods, or a closely related technical field.
3+ years of research or hands-on experience in AI, machine learning, generative AI, software engineering, formal methods, autonomous systems, or intelligent agent systems.
Strong programming skills in Python and practical experience with modern ML or LLM tooling such as PyTorch, Hugging Face Transformers, LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI, or comparable frameworks.
Hands-on experience building, evaluating, or testing LLM-powered applications, agentic workflows, multi-agent systems, or AI-enabled software engineering tools.
Strong understanding of software architecture, software engineering principles, testing methodologies, experimentation, and empirical evaluation of complex systems.
Demonstrated ability to conduct independent research, read and synthesize technical literature, analyze complex problems, prototype solutions, and communicate findings clearly.
Proficient in English, both written and verbal.
The position requires the person to be in the United States of America and hold a valid work permit in the US for the duration of the internship.

Preferred Skills

Research experience in formal verification, model checking, theorem proving, runtime verification, AI safety, robust AI, explainable AI (XAI), or trustworthy machine learning.
Experience with evaluation of LLMs or agents, including hallucination analysis, benchmark design, tool-use evaluation, prompt-injection testing, red teaming, or reliability metrics.
Familiarity with RAG architectures, vector databases, knowledge graphs, semantic technologies, ontologies, or graph-based reasoning.
Understanding of reinforcement learning, planning, reward modeling, preference optimization, or post-training approaches for LLMs and autonomous agents.
Experience with cloud-native or distributed systems concepts, microservice architectures, APIs, CI/CD, Git, Docker, Kubernetes, Azure, AWS, or comparable platforms.
Experience with testing frameworks for complex software systems, including property-based testing, fuzz testing, simulation-based testing, static analysis, or execution-based evaluation.
Track record of research publications, open-source contributions, academic projects, or demonstrable prototypes related to AI, software engineering, formal methods, or agentic systems.
Excellent problem-solving skills, attention to detail, and ability to quickly learn and apply new technologies, tools, and research methods.
Strong written and verbal communication skills, with the ability to articulate complex technical concepts to research and engineering audiences.

About Siemens:

We are a global technology company focused on industry, infrastructure, transport, and healthcare. From more resourceefficient factories, resilient supply chains, and smarter buildings and grids, to sustainable transportation as well as advanced healthcare, we create technology with purpose adding real value for customers. Learn more about Siemens here .

Our Commitment to Equity and Inclusion in our Diverse Global Workforce:

We value your unique identity and perspective. We are fully committed to providing equitable opportunities and building a workplace that reflects the diversity of society, while ensuring that we attract the best talent based on qualifications, skills, and experiences. We welcome you to bring your authentic self and transform the everyday with us.

#LI-JS

#LI-Remote

#ArtificialIntelligence, #MachineLearning, #GenerativeAI

Apply

Vacancy posted 27 days ago

Similar jobs that could be interesting for youBased on the Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship in Princeton, NJ vacancy

AI Research Engineer
$100k - $150k
...we’re looking for a skilled AI Research Engineer to join our dynamic... ...shipping advanced machine learning systems that solve high-impact... ...research landscape, can critically evaluate new techniques for real-world... ...-augmented generation, agentic systems, or multimodal architectures...
Suggested
Full time
H1b
Local area
Immediate start
Remote work
Visa sponsorship
Work visa
Bright Vision Technologies
West Windsor, NJ
3 days ago
Sr Business System Analyst
$80.12 - $101.75 per hour
...Sr Business Systems Analyst This is a hybrid role. New hire can... ...predictive capabilities, Insilco research, and discovery insights. This... ...of effective, usable, and trustworthy data products. User... ...in learning and/or applying AI/ML to drug discovery. Proven...
Suggested
Hourly pay
Omni Inclusive
Princeton, NJ
10 hours ago
Principal Scientist, Data Science and AI - Princeton, NJ
$180k - $260k
...Principal Scientist, Data Science and AI Location: Princeton, NJ... ...multi-criteria scoring systems to prioritize ingredients based... ...groups, Life Science and other research organizations, as well as the... ...using Large Language Models (LLM) for scientific knowledge extraction...
Suggested
Hourly pay
DSM-Firmenich
Princeton, NJ
17 hours ago
Research Scientist - Applied Artificial Intelligence in Chemistry
$115k - $140k
...Corporation is seeking a Research Scientist to develop and deploy practical AI/ML capabilities that... ...molecular design, agentic AI workflows, and scientific... ..., implement, and evaluate machine learning... ...including agentic AI systems, tool-harnessed LLM workflows, and scientific...
Suggested
Temporary work
Worldwide
Flexible hours
Universal Display Corporation
Trenton, NJ
13 days ago
Senior Data Scientist
$95.3k - $158.8k
...Senior Data Scientist AI for Science, Research Intelligence &... ...build the intelligent systems that make scientific... ...knowledge more discoverable, trustworthy, connected, and... ...will design, build, evaluate, and scale advanced AI... ...Build and optimize LLM-powered applications,...
Suggested
Local area
Worldwide
RELX
Trenton, NJ
17 hours ago
Senior Research Director - Center for Responsible AI in Learning and Assessment
$235k - $275k
...worldwide. Through ETS Research Institute and ETS... ...Center for Responsible AI in Learning and Assessment... ...and large-scale model evaluation. This role provides strategic... ...tasks) - LLM-enabled assessment design... ...and automated scoring systems - LLMs and evaluation...
Full time
Remote work
Worldwide
ETS
Princeton, NJ
more than 2 months ago
ONA AI Agent Intern (Logistics Focus) - OVIP
$30 per hour
...the Oracle Veteran Internship Program (OVIP): Oracle... ..., technical/systems consulting, technical... ...and deepest suite of AI-powered cloud applications... ...the “Autonomous, Agentic Supply Chain of the... ...and demo scripts. Research & Analysis Evaluate logistics AI use...
Internship
Hourly pay
Temporary work
Flexible hours
Oracle
Trenton, NJ
3 days ago
AI Engineer
$132k - $142k
.... Through ETS Research Institute and ETS... ...Digital Workplace AI Engineer/... ...This role builds agentic systems that automate workflows... ...tools are impactful, trustworthy, and scalable. If... ...solutions. LLM and AI experience... ...Experience designing and evaluating LLM-powered...
Full time
Worldwide
ETS
Princeton, NJ
more than 2 months ago
Remote Equity Research Analyst - AI Trainer ($50-$60 per hour)
$50 - $60 per hour
...DataAnnotation is committed to creating high-quality AI. Enjoy the flexibility of remote work and the freedom to set your own schedule... ...: Give AI chatbots diverse and complex problems and evaluate their outputs Evaluate the quality produced by AI models for...
Hourly pay
Contract work
For contractors
Work experience placement
Remote work
Data Annotation
Princeton, NJ
more than 2 months ago
Senior Cloud Engineer
...Bristol Myers Squibb's AI Venture Studio team, you... ..., but not limited to, agentic AI products and cross functional... ...of approved frontier LLM models and APIs. This... ...LLM APIs, retrieval systems, workflow engines, and... ..., Observability, and Evaluation Create and maintain...
Bristol-Myers Squibb Company
Princeton, NJ
4 days ago
Director, US Commercial AI & Advanced Analytics
$197.8k - $239.68k
...: The Director, US Commercial AI & Advanced Analytics is accountable... .... The Director also leads Agentic AI and advanced ML/modeling—... ...models, and intelligent decision systems that reshape how commercial... ...step reasoning workflows, and LLM-powered decision tools—that proactively...
Hourly pay
Full time
Temporary work
Part time
For contractors
Summer work
Live in
Work at office
Local area
Remote work
Flexible hours
Shift work
Bristol Myers Squibb
Princeton, NJ
4 days ago
Senior Application Security Architect
$120k - $202.5k
...applications, APIs, cloud-native platforms, and AI-enabled systems. You will partner with software... ...best practices across the enterprise. Evaluate emerging application security and AI security... ...risks, Large Language Models (LLMs), agentic AI systems, prompt injection, model...
Full time
Temporary work
Flexible hours
Shift work
State Street
Princeton, NJ
4 days ago
Software Engineering Lead
$115.4k - $192.3k
...expertise in Search based systems? About The Role... ...product, platform, and research stakeholders. You will... ...combine graph-powered agentic AI with advanced search... ...relevant, trustworthy, and precise answers to... ...building or integrating AI/LLM-powered or GenAI applications...
Local area
Worldwide
RELX
Trenton, NJ
17 hours ago
Machine Learning Engineer New Grad 2024-2025 -Remote
$107.66k - $161.7k
...explore and build with a wide variety of AI language models (bots), including o... ...the current Machine Learning systems, building performant and reliable LLM applications and collaborating with... ...software engineering experience via an internship, work experience, or coding...
Internship
Remote job
Full time
Work experience placement
Quora
Trenton, NJ
17 hours ago
Lead Architect , AI Solutions Architecture - EDDA
$169.4k - $279.6k
...Team! As a member of AI and Emerging Technology... ...thinking to architect AI systems that drive business... ...ML, Generative AI, and Agentic AI systems — including... ...business groups — spanning LLM selection and routing,... ...problems and opportunities, evaluating information to...
Temporary work
Work experience placement
Local area
Travelers Insurance
Trenton, NJ
17 hours ago
Principal Machine Learning Engineer
$114.6k - $234.6k
...encouraged. In? OCI ? AI Infrastructure ?org we are addressing... ..., as well as designing agentic systems deployed on OCI... ...fine-tuning, model serving, evaluation/benchmarking and human preference... ...Generative AI Modeling: Customizing LLM's, build and deploy LLM's at...
Temporary work
Flexible hours
Oracle
Trenton, NJ
17 hours ago
Data Scientist
...Data Science, Computer Science, Information Systems, Engineering or related field Hands-... ...Experience in taking models through AI/ML governance in highly regulated industry... ...any discrimination. All applicants will be evaluated solely on the basis of their ability,...
Diverse Lynx
Princeton, NJ
10 hours ago
Senior Software Systems Engineer, Behavior Validation
$125k - $191.7k
...categorized as hybrid/Remote Role: As a Senior Software Systems Engineer on the Software Validation team within the AV... ...systems. You will be responsible for shaping the future of evaluation methodologies for AI systems and other ADAS features, architecting solutions...
Local area
Remote work
Work from home
Flexible hours
General Motors
Trenton, NJ
2 days ago
Sourcing Data & Systems Analyst
...Dodge) is looking for a Sourcing Data & System Analyst. This position is a hands-on individual... ...contributor role responsible for evaluating, building, and improving the data pipelines... ...and recommend corrective actions Research, assess, and onboard new digital data sources...
Full time
Local area
Remote work
Home office
Work visa
Dodge Construction Network
Trenton, NJ
1 day ago
Pharmacist I
$72.45 per hour
...of Pharmacy or FPGEC certification and internship hours ~ Active unrestricted New... ...a time. As the leading academic health system in New Jersey, we advance innovative strategies... ...-quality patient care, education, and research to address both the clinical and social...
Internship
Hourly pay
Temporary work
Work experience placement
Local area
Flexible hours
Shift work
Afternoon shift
RWJBarnabas Health
Trenton, NJ
4 days ago
Full Stack AI Engineer
$94.18k - $114.12k
...Manager, Full Stack Engineer to serve in an AI‑first, agile product team supporting... ...products where large language models, agentic workflows, and data‑driven... ...Salesforce platform. Design and operate LLM‑native and agentic systems as long‑lived products, incorporating...
Hourly pay
Full time
Temporary work
Part time
For contractors
Summer work
Live in
Work at office
Local area
Remote work
Flexible hours
Shift work
Bristol Myers Squibb
Princeton, NJ
3 days ago
Director, World Model & Agentic Learning
$164k - $282.9k
...recruiting a Director, World Model & Agentic Learning to join our Data, Data Science & AI organization. This is a newly... ...foundational models. What We Need the System to Do Accumulate, don’t re-derive... ...overrides it. Accountability & Evaluation Define and prove the...
Full time
Local area
Immediate start
Johnson & Johnson Innovative Medicine
Hopewell, NJ
1 day ago
Grant Research Intern
$19 - $24 per hour
...Date: 7/28/2026 Existing Vacancies: 1 Internship Length: 9/21/2026- 12/11/2026 Program/Location... ..., and will be part of a team that researches environmental policies and department... ...seeking to work in State Government. To evaluate the effectiveness of our efforts to attract...
Internship
Hourly pay
New Jersey Department of Environmental Protection
Trenton, NJ
1 day ago
Machine Learning Engineer (PhD Intern)
...for talented Ph.D. students to have an internship in our fast moving team. You will have the... ...publications 1, 2, 3, 4, 5). Content AI Team : The Content AI team at Instacart... ...-impact AI solutions, applying LLMs, agentic systems, and computer vision to tackle complex challenges...
Internship
Remote job
Permanent employment
Work experience placement
Work at office
Work from home
Flexible hours
Instacart
Trenton, NJ
3 days ago
Manager Cybersecurity
$113.2k - $141.5k
...and the use of artificial intelligence (AI) to enhance detection, response, and operational... ..., integrity, and availability of AI systems. This role helps align cybersecurity... ...problem-solving skills, with the ability to evaluate complex security issues and develop effective...
Work at office
Local area
Remote work
Night shift
ACADIA Pharmaceuticals
Princeton, NJ
3 days ago
Principal Software Engineer
$144.2k - $288.4k
...decision-support applications. These systems combine low-latency distributed architectures with advanced LLM, OCR, and ML pipelines running... ..., structured outputs, evaluation frameworks, retrieval/vector search... ...of models on Vertex AI, including feature pipelines,...
Hourly pay
Full time
Temporary work
Local area
Flexible hours
CVS Health
Trenton, NJ
2 days ago
LLM-Based Knowledge Extraction and Failure Analysis Internship
...LLM-Based Knowledge Extraction and... ...Failure Analysis Internship Here at Siemens... ...ways. Siemens Research & Predevelopment (... ...supports our Software Systems and Processes team... ...the boundaries of AI and data science?... .... Create evaluation examples, test cases...
Internship
Full time
Immediate start
Remote work
Princeton, NJ
27 days ago
Senior Formulation Scientist
...studies on selected batches of products Evaluate relevant characteristics of raw materials... ...Manufacturing Operations and Quality systems Support qualification, calibration, and... ...Sciecure Pharma Inc. specializes in the research and development of pharmaceutical finished...
Permanent employment
Full time
Work experience placement
Work at office
Local area
Flexible hours
Afternoon shift
Sciecure Pharma Inc
Franklin Park, NJ
1 day ago
Director, AI Engineering and Enablement
...Position Summary As Director, AI Engineering and Enablement within Bristol... ...reviews. Framework Expertise Research, design, and implement Agentic AI solutions using frameworks such as... ...for enterprise production. Evaluate emerging AI techniques, tools, and frameworks...
Hourly pay
Full time
Temporary work
Part time
For contractors
Summer work
Live in
Work at office
Local area
Remote work
Flexible hours
Shift work
Bristol-Myers Squibb Company
Princeton, NJ
3 days ago
Tax Analyst
$57.84k - $95.4k
...balances for temporary differences Support tax research related to federal and state tax matters Assist... ...CPA or advanced degree in Tax Experience with ERP systems (e.g., SAP, Oracle, OneStream) Experience working with AI tools for Excel and research tasks Additional Knowledge...
Temporary work
Work at office
Local area
Flexible hours
Nrg Bluewater Wind
Princeton, NJ
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship. Be the first to apply!