On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

Shahi, Gautam Kishore; Hummel, Oliver

Computer Science > Computation and Language

arXiv:2502.15745 (cs)

[Submitted on 8 Feb 2025]

Title:On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

Authors:Gautam Kishore Shahi, Oliver Hummel

View PDF HTML (experimental)

Abstract:The rapid advancement of Large Language Models (LLMs) has led to a multitude of application opportunities. One traditional task for Information Retrieval systems is the summarization and classification of texts, both of which are important for supporting humans in navigating large literature bodies as they e.g. exist with scientific publications. Due to this rapidly growing body of scientific knowledge, recent research has been aiming at building research information systems that not only offer traditional keyword search capabilities, but also novel features such as the automatic detection of research areas that are present at knowledge intensive organizations in academia and industry. To facilitate this idea, we present the results obtained from evaluating a variety of LLMs in their ability to sort scientific publications into hierarchical classifications systems. Using the FORC dataset as ground truth data, we have found that recent LLMs (such as Meta Llama 3.1) are able to reach an accuracy of up to 0.82, which is up to 0.08 better than traditional BERT models.

Subjects:	Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG)
Cite as:	arXiv:2502.15745 [cs.CL]
	(or arXiv:2502.15745v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.15745

Submission history

From: Gautam Kishore Shahi [view email]
[v1] Sat, 8 Feb 2025 20:37:21 UTC (590 KB)

Computer Science > Computation and Language

Title:On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators