Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy

Dobslaw, Felix; Feldt, Robert; Yoon, Juyeon; Yoo, Shin

Computer Science > Software Engineering

arXiv:2503.00481 (cs)

[Submitted on 1 Mar 2025]

Title:Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy

Authors:Felix Dobslaw, Robert Feldt, Juyeon Yoon, Shin Yoo

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) and Multi-Agent LLMs (MALLMs) introduce non-determinism unlike traditional or machine learning software, requiring new approaches to verifying correctness beyond simple output comparisons or statistical accuracy over test datasets.
This paper presents a taxonomy for LLM test case design, informed by both the research literature, our experience, and open-source tools that represent the state of practice. We identify key variation points that impact test correctness and highlight open challenges that the research, industry, and open-source communities must address as LLMs become integral to software systems.
Our taxonomy defines four facets of LLM test case design, addressing ambiguity in both inputs and outputs while establishing best practices. It distinguishes variability in goals, the system under test, and inputs, and introduces two key oracle types: atomic and aggregated. Our mapping indicates that current tools insufficiently account for these variability points, highlighting the need for closer collaboration between academia and practitioners to improve the reliability and reproducibility of LLM testing.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.00481 [cs.SE]
	(or arXiv:2503.00481v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2503.00481

Submission history

From: Felix Dobslaw [view email]
[v1] Sat, 1 Mar 2025 13:15:56 UTC (16 KB)

Computer Science > Software Engineering

Title:Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators