Generalizing over Long Tail Concepts for Medical Term Normalization

Portelli, Beatrice; Scaboro, Simone; Santus, Enrico; Sedghamiz, Hooman; Chersoni, Emmanuele; Serra, Giuseppe

Computer Science > Computation and Language

arXiv:2210.11947 (cs)

[Submitted on 21 Oct 2022 (v1), last revised 3 Nov 2022 (this version, v2)]

Title:Generalizing over Long Tail Concepts for Medical Term Normalization

Authors:Beatrice Portelli, Simone Scaboro, Enrico Santus, Hooman Sedghamiz, Emmanuele Chersoni, Giuseppe Serra

View PDF

Abstract:Medical term normalization consists in mapping a piece of text to a large number of output classes. Given the small size of the annotated datasets and the extremely long tail distribution of the concepts, it is of utmost importance to develop models that are capable to generalize to scarce or unseen concepts. An important attribute of most target ontologies is their hierarchical structure. In this paper we introduce a simple and effective learning strategy that leverages such information to enhance the generalizability of both discriminative and generative models. The evaluation shows that the proposed strategy produces state-of-the-art performance on seen concepts and consistent improvements on unseen ones, allowing also for efficient zero-shot knowledge transfer across text typologies and datasets.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2210.11947 [cs.CL]
	(or arXiv:2210.11947v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.11947

Submission history

From: Enrico Santus [view email]
[v1] Fri, 21 Oct 2022 13:17:36 UTC (103 KB)
[v2] Thu, 3 Nov 2022 15:06:57 UTC (101 KB)

Computer Science > Computation and Language

Title:Generalizing over Long Tail Concepts for Medical Term Normalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Generalizing over Long Tail Concepts for Medical Term Normalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators