Misspelling Oblivious Word Embeddings

Edizel, Bora; Piktus, Aleksandra; Bojanowski, Piotr; Ferreira, Rui; Grave, Edouard; Silvestri, Fabrizio

Computer Science > Computation and Language

arXiv:1905.09755 (cs)

[Submitted on 23 May 2019]

Title:Misspelling Oblivious Word Embeddings

Authors:Bora Edizel, Aleksandra Piktus, Piotr Bojanowski, Rui Ferreira, Edouard Grave, Fabrizio Silvestri

View PDF

Abstract:In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

Comments:	9 Pages
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1905.09755 [cs.CL]
	(or arXiv:1905.09755v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1905.09755

Submission history

From: Bora Edizel [view email]
[v1] Thu, 23 May 2019 16:28:08 UTC (631 KB)

Computer Science > Computation and Language

Title:Misspelling Oblivious Word Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Misspelling Oblivious Word Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators