Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship

Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship

Here at Siemens, we take pride in enabling sustainable progress through technology. We do this through empowering customers by combining the real and digital worlds. Improving how we live, work, and move today and for the next generation! We know that the only way a business thrive is if our people are thriving. That’s why we always put our people first. Our global, diverse team would be happy to support you and challenge you to grow in new ways.

Siemens Research & Predevelopment (RPD) is the central R&D department of Siemens and thus has a key role to shape the future of our products. RPD acts as a strategic partner to support the executive units of Siemens. In consequence the main research focus is on future technologies for industry, infrastructure, mobility, and healthcare. In this context, we are looking for an Intern that supports our Software Systems and Processes team in Princeton, NJ by researching and developing scalable intelligent systems using LLMs and semantic technologies.

Transform the everyday with us!

Are you passionate about ensuring the reliability and robustness of cutting-edge AI systems? We're looking for an innovative PhD intern to join our team and contribute to groundbreaking research focused on implementing a Verification and Validation (V&V) framework for multi-agent systems. 

Modern software is rapidly moving from static applications to agentic AI systems that plan, reason, call tools, coordinate across agents, and adapt over multiple steps. As these LLM-powered systems enter industrial workflows, the critical challenge is no longer only building capable agents—it is evaluating, verifying, and validating that they behave reliably, safely, and transparently in complex, uncertain environments. In this internship, you will research and prototype next-generation methods for LLM and multi-agent system evaluation, including benchmarks, guardrails, failure-mode analysis, runtime monitoring, formal methods, and testing technologies. You will help advance trustworthy AI for real-world industrial software systems where robustness, explainability, and dependable performance matter.

The internship provides a unique experience to contribute to innovative industrial applications while mentored by experienced professionals in an international setting.

This role is preferred to be on-site in Princeton, NJ, for a hands-on and collaborative experience, however remote candidates will be considered. The position is a full-time role for at least 3 months with the possibility of extension.

Key Responsibilities

  • Research, design, and prototype V&V methods for multi-agent and agentic AI systems, with emphasis on reliability, safety, repeatability, explainability, and robustness under uncertain operating conditions.
  • Develop evaluation harnesses, benchmarks, and test scenarios for LLM-based agents, including tool use, multi-step reasoning, orchestration, failure-mode analysis, and adversarial or edge-case behavior.
  • Implement proof-of-concept prototypes in Python using modern AI and agent frameworks, formal methods, testing technologies, and retrieval-augmented or knowledge-grounded architectures where appropriate.
  • Investigate verification strategies such as model checking, property-based testing, fuzz testing, static or dynamic analysis, runtime monitoring, guardrails, and trace-based observability for complex intelligent systems.
  • Collaborate with researchers and engineers to define milestones, run experiments, analyze results, and translate research insights into scalable industrial software concepts.
  • Document findings, contribute to scientific publications or technical reports, and present results clearly to internal and external technical audiences. 

Basic Qualifications

  • Currently enrolled in a PhD program in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Formal Methods, or a closely related technical field.
  • 3+ years of research or hands-on experience in AI, machine learning, generative AI, software engineering, formal methods, autonomous systems, or intelligent agent systems.
  • Strong programming skills in Python and practical experience with modern ML or LLM tooling such as PyTorch, Hugging Face Transformers, LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI, or comparable frameworks.
  • Hands-on experience building, evaluating, or testing LLM-powered applications, agentic workflows, multi-agent systems, or AI-enabled software engineering tools.
  • Strong understanding of software architecture, software engineering principles, testing methodologies, experimentation, and empirical evaluation of complex systems.
  • Demonstrated ability to conduct independent research, read and synthesize technical literature, analyze complex problems, prototype solutions, and communicate findings clearly.
  • Proficient in English, both written and verbal.
  • The position requires the person to be in the United States of America and hold a valid work permit in the US for the duration of the internship.

Preferred Skills

  • Research experience in formal verification, model checking, theorem proving, runtime verification, AI safety, robust AI, explainable AI (XAI), or trustworthy machine learning.
  • Experience with evaluation of LLMs or agents, including hallucination analysis, benchmark design, tool-use evaluation, prompt-injection testing, red teaming, or reliability metrics.
  • Familiarity with RAG architectures, vector databases, knowledge graphs, semantic technologies, ontologies, or graph-based reasoning.
  • Understanding of reinforcement learning, planning, reward modeling, preference optimization, or post-training approaches for LLMs and autonomous agents.
  • Experience with cloud-native or distributed systems concepts, microservice architectures, APIs, CI/CD, Git, Docker, Kubernetes, Azure, AWS, or comparable platforms.
  • Experience with testing frameworks for complex software systems, including property-based testing, fuzz testing, simulation-based testing, static analysis, or execution-based evaluation.
  • Track record of research publications, open-source contributions, academic projects, or demonstrable prototypes related to AI, software engineering, formal methods, or agentic systems.
  • Excellent problem-solving skills, attention to detail, and ability to quickly learn and apply new technologies, tools, and research methods.
  • Strong written and verbal communication skills, with the ability to articulate complex technical concepts to research and engineering audiences.

About Siemens:

We are a global technology company focused on industry, infrastructure, transport, and healthcare. From more resourceefficient factories, resilient supply chains, and smarter buildings and grids, to sustainable transportation as well as advanced healthcare, we create technology with purpose adding real value for customers. Learn more about Siemens here .

Our Commitment to Equity and Inclusion in our Diverse Global Workforce:

We value your unique identity and perspective. We are fully committed to providing equitable opportunities and building a workplace that reflects the diversity of society, while ensuring that we attract the best talent based on qualifications, skills, and experiences. We welcome you to bring your authentic self and transform the everyday with us.

#LI-JS

#LI-Remote

#ArtificialIntelligence, #MachineLearning, #GenerativeAI

Vacancy posted 7 days ago
Similar jobs that could be interesting for youBased on the Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship in Princeton, NJ vacancy
  •  ...Engineering of Intelligent Systems Researcher Internship Here at Siemens,...  ...the boundaries of AI and software...  ...focused on advanced agentic AI software architectures...  ...~ Develop and evaluate novel algorithms and...  ...mechanisms, and the latest LLM architectures (e.g.,... 
    Internship
    Full time
    Immediate start
    Remote work
    Flexible hours
    Princeton, NJ
    15 days ago
  •  ...Cybersecurity & AI Research Intern: AI-Assisted Vulnerability...  ...a benchmark and evaluation workflow for vulnerability...  ...methods. This internship is offered as an on-site...  ...experience with open-source LLM tooling and evaluation...  ...such as control systems used in energy utilities... 
    Internship
    Full time
    Work at office
    Immediate start
    Relocation package
    Flexible hours
    Princeton, NJ
    28 days ago
  • $132k - $142k

     .... Through ETS Research Institute and ETS...  ...Digital Workplace AI Engineer/...  ...This role builds agentic systems that automate workflows...  ...tools are impactful, trustworthy, and scalable. If...  ...solutions. • LLM and AI experience...  ...designing and evaluating LLM-powered conversational... 
    Suggested
    Worldwide

    Educational Testing Service

    Princeton, NJ
    5 days ago
  • $80.12 - $101.75 per hour

     ...Senior IT Business Systems Analyst This is a hybrid role. New...  ...predictive capabilities, Insilco research, and discovery insights. This...  ...of effective, usable, and trustworthy data products. User Acceptance...  ...in learning and/or applying AI/Client to drug discovery.... 
    Suggested
    Hourly pay

    Omni Inclusive

    Princeton, NJ
    4 hours ago
  • $180k - $260k

     ...Principal Scientist, Data Science and AI Location: Princeton, NJ...  ...multi-criteria scoring systems to prioritize ingredients based...  ...groups, Life Science and other research organizations, as well as the...  ...using Large Language Models (LLM) for scientific knowledge extraction... 
    Suggested
    Hourly pay

    Dérivés Résiniques et Terpéniques

    Princeton, NJ
    4 days ago
  • $115k - $140k

     ...Corporation is seeking a Research Scientist to develop and deploy practical AI/ML capabilities that...  ...molecular design, agentic AI workflows, and scientific...  ..., implement, and evaluate machine learning...  ...including agentic AI systems, tool-harnessed LLM workflows, and scientific... 
    Temporary work
    Worldwide
    Flexible hours

    Universal Display Corporation

    Trenton, NJ
    23 days ago
  • $235k - $275k

     ...worldwide. Through ETS Research Institute and ETS...  ...Center for Responsible AI in Learning and Assessment...  ...and large-scale model evaluation. This role provides strategic...  ...tasks) - LLM-enabled assessment design...  ...and automated scoring systems     - LLMs and evaluation... 
    Full time
    Remote work
    Worldwide

    ETS

    Princeton, NJ
    more than 2 months ago
  • $111.4k - $178k

     ...Physics Laboratory has an open position for a research staff scientist onsite at PPPL. The...  ...advance the application of real‑time control systems to experimental fusion systems, including...  ...and stellarators. The ability to employ AI‑enabled solutions and AI‑based state... 
    Full time
    Worldwide

    University of Georgia- FACS

    Princeton, NJ
    1 day ago
  • $225k - $300k

     ...access. Each organization (Evaluate, MMIT, Panalgo,...  ...culture, and delivering AI-powered solutions that...  ...the adoption of GenAI, Agentic AI, and LLM-powered architectures across...  ..., production-grade AI systems Stay at the cutting edge of AI/ML research; deliver regular presentations... 
    Full time
    Temporary work
    Local area
    Remote work
    Flexible hours

    Norstella

    Trenton, NJ
    2 days ago
  • $108.65k - $131.66k

     ...with-us. Summary: As an AI Engineer within...  ...pipelines, integrations, and agentic AI systems that power AI...  ...interoperability. Implement LLM-powered workflows integrating...  ...API access. Build evaluation and observability...  ...Engineers, Pod Leads, and Research scientists to deliver... 
    Hourly pay
    Full time
    Temporary work
    Part time
    For contractors
    Summer work
    Live in
    Work at office
    Local area
    Remote work
    Flexible hours
    Shift work

    Bristol Myers Squibb

    Princeton, NJ
    18 hours ago
  • $30 per hour

     ...the Oracle Veteran Internship Program (OVIP):...  ...technology, technical/systems consulting, technical...  ...and deepest suite of AI-powered cloud applications...  ...the "Autonomous, Agentic Supply Chain of the...  ...demo scripts. Research & Analysis Evaluate logistics AI use cases... 
    Internship
    Hourly pay
    Temporary work
    Flexible hours

    Oracle

    Trenton, NJ
    2 days ago
  • $59.38k - $92.35k

     ...and oversight; and (iv) remediation system O&M; Evaluating chemical data and contaminant fate &...  ...Experience and Qualifications Prior research experience in contaminated media or...  ...remediation technology. (preferred) Prior internship experience in environmental... 
    Internship
    Full time
    For contractors
    For subcontractor
    Work at office
    Remote work
    Night shift

    Geosyntec Consultants

    Pennington, NJ
    2 days ago
  • $95.3k - $158.8k

     ...platforms: Clinical Key AI, Sherpath AI, and...  ...services. Our systems operate over one...  ...features (GenAI, Agentic AI, RAG, etc.) search...  ...ML & LLM Engineering, Search...  ...graph DBs . Build evaluation pipelines: offline...  ...with the latest GAI research, NLP and RAG and apply... 
    Local area

    RELX

    Trenton, NJ
    9 days ago
  •  ...Job Title Hands-on experience with agentic AI and frameworks such as Semantic Kernel and...  ...pipelines. Experience building multi-agent systems, using Lang graph or Azure AI Factory....  ..., Spark and Spark ML. Experience in evaluating the AI ML models for Telco operations... 

    Diverse Lynx

    Princeton, NJ
    3 hours ago
  • $225k - $337.5k

     ...and operate enterprise AI platforms at scale, enabling...  ...ML, Generative AI, and Agentic AI—using modern...  ...foundation models) Agentic AI systems and orchestration...  ...Generative AI platforms (LLM integration, orchestration...  ..., prompt management, evaluation) Agentic AI platforms (... 
    Full time
    Temporary work
    Flexible hours

    State Street

    Princeton, NJ
    4 days ago
  • $186.07k - $218.9k

     ...future global financial system. To achieve our...  ...dedicated to developing AI infrastructure and automation...  ...models (LLMs) and Agentic AI, to solve complex challenges...  .... Background in AI/LLM infrastructure is a...  ...you to carefully evaluate how your skills and interests... 
    Local area

    Coinbase

    Trenton, NJ
    4 days ago
  •  ...Application Engineer (AI Venture Studio delivery...  ...applications, including agentic AI products and knowledge...  ...services integrating LLM APIs, retrieval, workflow...  ..., and enterprise systems. Build MCP‑accessible read...  ...IaC/CI/CD, observability/evaluation (e.g., LangSmith),... 
    Flexible hours

    Scorpion Therapeutics

    Princeton, NJ
    5 days ago
  • $185k

     ...telemetry infrastructure for the AI era. At Cribl, we partner with IT...  ...You Will... Design, train, and evaluate machine learning models across a range of research and applied AI initiatives Run...  ...into practical, production-ready systems Build and maintain robust ML pipelines... 
    Temporary work
    Remote work

    Cribl, Inc

    Trenton, NJ
    2 days ago
  •  ...real business value with AI. What You'll Do...  ...deploy, and monitor AI and agentic applications on top of...  ...enterprise customers build trustworthy AI workflows - from...  ...Deep understanding of LLM integration patterns: RAG...  ...fine-tuning pipelines, evaluation frameworks, and agent... 
    Permanent employment
    Flexible hours

    Teradata

    Trenton, NJ
    5 days ago
  • $86.6k - $144.4k

    Data Scientist Build AI That Accelerates Scientific...  ...your work to help researchers solve humanity’s biggest...  ...building intelligent systems that can reason across...  ...more discoverable, trustworthy, connected, and actionable...  .... Fine-tune, evaluate, and integrate large language... 
    Local area

    RELX

    Trenton, NJ
    1 day ago
  • $100k - $150k

     ...we’re looking for a skilled AI Research Engineer (Applied AI) to join...  ...shipping advanced machine learning systems that solve high-impact...  ...research landscape, can critically evaluate new techniques for real-world...  ...-augmented generation, agentic systems, or multimodal architectures... 
    Full time
    H1b
    Local area
    Immediate start
    Remote work
    Visa sponsorship
    Work visa

    Bright Vision Technologies

    Hightstown, NJ
    6 days ago
  •  ...The AI Solutions Architect tasks include: • Architect & Design...  ...tuning, prompt engineering, agentic workflows, routing, multimodal...  ...Design and implement agentic systems using tool/function calling, planner...  ...and template versioning • Evaluate and integrate LLMs from both... 

    3B Staffing LLC

    Princeton, NJ
    3 days ago
  • $96.8k - $251.6k

     ...observability, and responsible AI delivery....  ...high-scale distributed systems, APIs, services, data pipelines...  .... Establish agentic-first engineering workflows...  ...RAG, agent harnesses, evaluation suites, workflow orchestration...  ..., guardrails, and LLM-backed experiences.... 
    Temporary work
    Flexible hours

    Oracle

    Trenton, NJ
    3 days ago
  • $120.3k - $222.6k

     ...as Market Intelligence and Primary Market Research managers. Essential Functions Strategic...  ...questions. Explore and develop innovative AI solutions to generate comprehensive...  ...generative AI models preferred. Experience with agentic AI models is plus. Strong Microsoft... 
    Work experience placement
    Work at office
    Local area
    Flexible hours
    Night shift

    Novo Nordisk A/S

    Plainsboro, NJ
    4 days ago
  • $170k - $215k

     ...organization (Citeline, Evaluate, MMIT, Panalgo, The...  ...opportunities to apply AI / ML to our content and products Conduct research and identify AI / ML algorithms...  ...to successful deployed systems, past successes have...  ...approaches, including: agentic systems and model... 
    Full time
    Temporary work
    Local area
    Remote work
    Flexible hours

    Norstella

    Trenton, NJ
    7 days ago
  • $107.66k - $161.7k

     ...explore and build with a wide variety of AI language models (bots), including o...  ...the current Machine Learning systems, building performant and reliable LLM applications and collaborating with...  ...software engineering experience via an internship, work experience, or coding... 
    Internship
    Remote job
    Full time
    Work experience placement

    Quora

    Trenton, NJ
    3 hours ago
  • $180.37k - $212.2k

     ...and with it, the future global financial system. To achieve our mission, we’re...  ...automation-first mindset, utilizing LLMs and agentic AI to build scalable, next-gen Data Loss Prevention...  ...against data security threats Evaluate and direct complex designs/controls across... 
    Local area

    Coinbase

    Trenton, NJ
    5 days ago
  •  ...Machine Learning Engineer to join a high-performing AI and Data Science team responsible for developing,...  .... Explore and implement Large Language Model (LLM) solutions where applicable. Participate in model evaluation, performance tuning, and continuous improvement initiatives... 

    GCS Recruitment

    Princeton, NJ
    3 days ago
  •  ...worldwide. Through ETS Research Institute and ETS...  ...Position Summary The AI Model Development Engineer...  ...designing, developing, evaluating, and deploying machine...  ...scaling AI-driven scoring systems for constructed...  ...transformer architectures and LLM fine-tuning and/or... 
    Worldwide

    Educational Testing Service

    Trenton, NJ
    4 days ago
  • $169.4k - $279.6k

     ...Team! As a member of AI and Emerging Technology...  ...thinking to architect AI systems that drive business...  ...ML, Generative AI, and Agentic AI systems — including...  ...business groups — spanning LLM selection and routing,...  ...problems and opportunities, evaluating information to... 
    Temporary work
    Work experience placement
    Local area

    Travelers Insurance

    Trenton, NJ
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship. Be the first to apply!