Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping

Ni, Jian; Florian, Radu

doi:10.18653/v1/D16-1135

Computer Science > Computation and Language

arXiv:1707.02459 (cs)

[Submitted on 8 Jul 2017]

Title:Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping

Authors:Jian Ni, Radu Florian

View PDF

Abstract:The state-of-the-art named entity recognition (NER) systems are statistical machine learning models that have strong generalization capability (i.e., can recognize unseen entities that do not appear in training data) based on lexical and contextual information. However, such a model could still make mistakes if its features favor a wrong entity type. In this paper, we utilize Wikipedia as an open knowledge base to improve multilingual NER systems. Central to our approach is the construction of high-accuracy, high-coverage multilingual Wikipedia entity type mappings. These mappings are built from weakly annotated data and can be extended to new languages with no human annotation or language-dependent knowledge involved. Based on these mappings, we develop several approaches to improve an NER system. We evaluate the performance of the approaches via experiments on NER systems trained for 6 languages. Experimental results show that the proposed approaches are effective in improving the accuracy of such systems on unseen entities, especially when a system is applied to a new domain or it is trained with little training data (up to 18.3 F1 score improvement).

Comments:	11 pages, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:1707.02459 [cs.CL]
	(or arXiv:1707.02459v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1707.02459
Related DOI:	https://doi.org/10.18653/v1/D16-1135

Submission history

From: Jian Ni [view email]
[v1] Sat, 8 Jul 2017 16:17:04 UTC (25 KB)

Computer Science > Computation and Language

Title:Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators