PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Li, Mingchen; Chen, M.; Zhou, Huixue; Zhang, Rui

Computer Science > Computation and Language

arXiv:2310.18463v1 (cs)

[Submitted on 27 Oct 2023 (this version), latest version 27 Apr 2024 (v6)]

Title:PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Authors:Mingchen Li, M.Chen, Huixue Zhou, Rui Zhang

View PDF

Abstract:The automatic extraction of biomedical entities and their interaction from unstructured data remains a challenging task due to the limited availability of expert-labeled standard datasets. In this paper, we introduce PETAI-LOR, a retrieval-based language framework that is augmented by tailored chunk scorer. Unlike previous retrieval-augmented language models (LM) that retrieve relevant documents by calculating the similarity between the input sentence and the candidate document set, PETAILOR segments the sentence into chunks and retrieves the relevant chunk from our pre-computed chunk-based relational key-value memory. Moreover, in order to comprehend the specific requirements of the LM, PETAI-LOR adapt the tailored chunk scorer to the LM. We also introduce GM-CIHT, an expert annotated biomedical triple extraction dataset with more relation types. This dataset is centered on the non-drug treatment and general biomedical domain. Additionally, we investigate the efficacy of triple extraction models trained on general domains when applied to the biomedical domain. Our experiments reveal that PETAI-LOR achieves state-of-the-art performance on GM-CIHT

Comments:	this is the first preprint version
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.18463 [cs.CL]
	(or arXiv:2310.18463v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.18463

Submission history

From: Mingchen Li [view email]
[v1] Fri, 27 Oct 2023 20:15:23 UTC (1,040 KB)
[v2] Mon, 12 Feb 2024 17:05:48 UTC (1,366 KB)
[v3] Tue, 13 Feb 2024 13:57:27 UTC (1,366 KB)
[v4] Tue, 16 Apr 2024 15:00:06 UTC (188 KB)
[v5] Wed, 17 Apr 2024 12:03:27 UTC (188 KB)
[v6] Sat, 27 Apr 2024 13:28:18 UTC (188 KB)

Computer Science > Computation and Language

Title:PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators