NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

Loukachevitch, Natalia; Manandhar, Suresh; Baral, Elina; Rozhkov, Igor; Braslavski, Pavel; Ivanov, Vladimir; Batura, Tatiana; Tutubalina, Elena

doi:10.1093/bioinformatics/btad161

Computer Science > Computation and Language

arXiv:2210.11913 (cs)

[Submitted on 21 Oct 2022]

Title:NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

Authors:Natalia Loukachevitch, Suresh Manandhar, Elina Baral, Igor Rozhkov, Pavel Braslavski, Vladimir Ivanov, Tatiana Batura, Elena Tutubalina

View PDF

Abstract:This paper describes NEREL-BIO -- an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect.
NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL -> NEREL-BIO) and cross-language (English -> Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension (MRC) models and report their results.
The dataset is freely available at this https URL.

Comments:	Submitted to Bioinformatics (Publisher: Oxford University Press)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2210.11913 [cs.CL]
	(or arXiv:2210.11913v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.11913
Journal reference:	Bioinformatics, Volume 39, Issue 4, April 2023, btad161
Related DOI:	https://doi.org/10.1093/bioinformatics/btad161

Submission history

From: Elena Tutubalina Dr. [view email]
[v1] Fri, 21 Oct 2022 12:28:43 UTC (285 KB)

Computer Science > Computation and Language

Title:NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators