A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain

Kroll, Hermann; Sackhoff, Pascal; Thang, Bill Matthias; Ksouri, Maha; Balke, Wolf-Tilo

doi:10.1145/3677389.3702557

Computer Science > Digital Libraries

arXiv:2411.12752 (cs)

[Submitted on 6 Nov 2024]

Title:A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain

Authors:Hermann Kroll, Pascal Sackhoff, Bill Matthias Thang, Maha Ksouri, Wolf-Tilo Balke

View PDF HTML (experimental)

Abstract:Digital libraries that maintain extensive textual collections may want to further enrich their content for certain downstream applications, e.g., building knowledge graphs, semantic enrichment of documents, or implementing novel access paths. All of these applications require some text processing, either to identify relevant entities, extract semantic relationships between them, or to classify documents into some categories. However, implementing reliable, supervised workflows can become quite challenging for a digital library because suitable training data must be crafted, and reliable models must be trained. While many works focus on achieving the highest accuracy on some benchmarks, we tackle the problem from a digital library practitioner. In other words, we also consider trade-offs between accuracy and application costs, dive into training data generation through distant supervision and large language models such as ChatGPT, LLama, and Olmo, and discuss how to design final pipelines. Therefore, we focus on relation extraction and text classification, using the showcase of eight biomedical benchmarks.

Comments:	JCD2024 Full Paper, 12 pages, 6 figures
Subjects:	Digital Libraries (cs.DL); Computation and Language (cs.CL)
ACM classes:	H.4
Cite as:	arXiv:2411.12752 [cs.DL]
	(or arXiv:2411.12752v1 [cs.DL] for this version)
	https://doi.org/10.48550/arXiv.2411.12752
Related DOI:	https://doi.org/10.1145/3677389.3702557

Submission history

From: Hermann Kroll [view email]
[v1] Wed, 6 Nov 2024 07:54:10 UTC (106 KB)

Computer Science > Digital Libraries

Title:A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Digital Libraries

Title:A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators