PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Li, Mingchen; Chen, M.; Zhou, Huixue; Kilicoglu, Halil; Zhang, Rui

Computer Science > Computation and Language

arXiv:2310.18463v2 (cs)

[Submitted on 27 Oct 2023 (v1), revised 12 Feb 2024 (this version, v2), latest version 27 Apr 2024 (v6)]

Title:PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Authors:Mingchen Li, M. Chen, Huixue Zhou, Halil Kilicoglu, Rui Zhang

View PDF

Abstract:Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. While current unified information extraction models showcase state-of-the-art performance, they face challenges in understanding relationships between entities within intricate biomedical sentences. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To tackle these challenges, we propose a novel retrieval-based framework for biomedical triple extraction, namely PeTailor, which explicitly retrieves the relevant document from our pre-built diverse chunk database using a novel tailored chunk scorer and integrates the retrieved information into the input of a Large Language Model (LLM) to generate the corresponding triple (head entity, relation, tail entity) for the input sentence. Additionally, we present GM-CIHT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types. Experimental results show that our proposed PeTailor method achieves state-of-the-art performance on GM-CIHT and two standard biomedical triple extraction datasets

Comments:	this is the second preprint version
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.18463 [cs.CL]
	(or arXiv:2310.18463v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.18463

Submission history

From: Mingchen Li [view email]
[v1] Fri, 27 Oct 2023 20:15:23 UTC (1,040 KB)
[v2] Mon, 12 Feb 2024 17:05:48 UTC (1,366 KB)
[v3] Tue, 13 Feb 2024 13:57:27 UTC (1,366 KB)
[v4] Tue, 16 Apr 2024 15:00:06 UTC (188 KB)
[v5] Wed, 17 Apr 2024 12:03:27 UTC (188 KB)
[v6] Sat, 27 Apr 2024 13:28:18 UTC (188 KB)

Computer Science > Computation and Language

Title:PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators