Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction

Hsu, Enshuo; Ugbala, Martin; Kookal, Krishna Kumar; Kawtar, Zouaidi; Rider, Nicholas L.; Walji, Muhammad F.; Roberts, Kirk

Computer Science > Computation and Language

arXiv:2504.02871 (cs)

[Submitted on 1 Apr 2025]

Title:Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction

Authors:Enshuo Hsu, Martin Ugbala, Krishna Kumar Kookal, Zouaidi Kawtar, Nicholas L. Rider, Muhammad F. Walji, Kirk Roberts

View PDF

Abstract:Generative information extraction using large language models, particularly through few-shot learning, has become a popular method. Recent studies indicate that providing a detailed, human-readable guideline-similar to the annotation guidelines traditionally used for training human annotators can significantly improve performance. However, constructing these guidelines is both labor- and knowledge-intensive. Additionally, the definitions are often tailored to meet specific needs, making them highly task-specific and often non-reusable. Handling these subtle differences requires considerable effort and attention to detail. In this study, we propose a self-improving method that harvests the knowledge summarization and text generation capacity of LLMs to synthesize annotation guidelines while requiring virtually no human input. Our zero-shot experiments on the clinical named entity recognition benchmarks, 2012 i2b2 EVENT, 2012 i2b2 TIMEX, 2014 i2b2, and 2018 n2c2 showed 25.86%, 4.36%, 0.20%, and 7.75% improvements in strict F1 scores from the no-guideline baseline. The LLM-synthesized guidelines showed equivalent or better performance compared to human-written guidelines by 1.15% to 4.14% in most tasks. In conclusion, this study proposes a novel LLM self-improving method that requires minimal knowledge and human input and is applicable to multiple biomedical domains.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2504.02871 [cs.CL]
	(or arXiv:2504.02871v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.02871

Submission history

From: Enshuo Hsu [view email]
[v1] Tue, 1 Apr 2025 15:59:04 UTC (611 KB)

Computer Science > Computation and Language

Title:Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators