MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

Kwon, Sunjae; Yao, Zonghai; Jordan, Harmon S.; Levy, David A.; Corner, Brian; Yu, Hong

Computer Science > Computation and Language

arXiv:2210.05875 (cs)

[Submitted on 12 Oct 2022]

Title:MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

Authors:Sunjae Kwon, Zonghai Yao, Harmon S. Jordan, David A. Levy, Brian Corner, Hong Yu

View PDF

Abstract:This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences ($MedJ$). Then, we introduce a novel medical jargon extraction ($MedJEx$) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.

Comments:	Accepted to EMNLP 22
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2210.05875 [cs.CL]
	(or arXiv:2210.05875v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.05875

Submission history

From: Sunjae Kwon [view email]
[v1] Wed, 12 Oct 2022 02:27:32 UTC (883 KB)

Computer Science > Computation and Language

Title:MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators