FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Thakur, Nandan; Lin, Jimmy; Havens, Sam; Carbin, Michael; Khattab, Omar; Drozdov, Andrew

Computer Science > Information Retrieval

arXiv:2504.13128 (cs)

[Submitted on 17 Apr 2025]

Title:FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Authors:Nandan Thakur, Jimmy Lin, Sam Havens, Michael Carbin, Omar Khattab, Andrew Drozdov

View PDF HTML (experimental)

Abstract:We introduce FreshStack, a reusable framework for automatically building information retrieval (IR) evaluation benchmarks from community-asked questions and answers. FreshStack conducts the following steps: (1) automatic corpus collection from code and technical documentation, (2) nugget generation from community-asked questions and answers, and (3) nugget-level support, retrieving documents using a fusion of retrieval techniques and hybrid architectures. We use FreshStack to build five datasets on fast-growing, recent, and niche topics to ensure the tasks are sufficiently challenging. On FreshStack, existing retrieval models, when applied out-of-the-box, significantly underperform oracle approaches on all five topics, denoting plenty of headroom to improve IR quality. In addition, we identify cases where rerankers do not clearly improve first-stage retrieval accuracy (two out of five topics). We hope that FreshStack will facilitate future work toward constructing realistic, scalable, and uncontaminated IR and RAG evaluation benchmarks. FreshStack datasets are available at: this https URL.

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2504.13128 [cs.IR]
	(or arXiv:2504.13128v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2504.13128

Submission history

From: Nandan Thakur [view email]
[v1] Thu, 17 Apr 2025 17:44:06 UTC (2,574 KB)

Computer Science > Information Retrieval

Title:FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators