Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

Si, Yuqi; Bernstam, Elmer V; Roberts, Kirk

doi:10.1016/j.jbi.2021.103726

Computer Science > Computation and Language

arXiv:2103.00482 (cs)

[Submitted on 24 Feb 2021]

Title:Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

Authors:Yuqi Si, Elmer V Bernstam, Kirk Roberts

View PDF

Abstract:The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.

Comments:	Journal of Biomedical Informatics (in press)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2103.00482 [cs.CL]
	(or arXiv:2103.00482v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2103.00482
Related DOI:	https://doi.org/10.1016/j.jbi.2021.103726

Submission history

From: Yuqi Si [view email]
[v1] Wed, 24 Feb 2021 18:18:02 UTC (1,086 KB)

Computer Science > Computation and Language

Title:Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators