Improving part-of-speech tagging via multi-task learning and character-level word representations

Anastasyev, Daniil; Gusev, Ilya; Indenbom, Eugene

Computer Science > Computation and Language

arXiv:1807.00818 (cs)

[Submitted on 2 Jul 2018]

Title:Improving part-of-speech tagging via multi-task learning and character-level word representations

Authors:Daniil Anastasyev, Ilya Gusev, Eugene Indenbom

View PDF

Abstract:In this paper, we explore the ways to improve POS-tagging using various types of auxiliary losses and different word representations. As a baseline, we utilized a BiLSTM tagger, which is able to achieve state-of-the-art results on the sequence labelling tasks. We developed a new method for character-level word representation using feedforward neural network. Such representation gave us better results in terms of speed and performance of the model. We also applied a novel technique of pretraining such word representations with existing word vectors. Finally, we designed a new variant of auxiliary loss for sequence labelling tasks: an additional prediction of the neighbour labels. Such loss forces a model to learn the dependencies in-side a sequence of labels and accelerates the process of training. We test these methods on English and Russian languages.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1807.00818 [cs.CL]
	(or arXiv:1807.00818v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1807.00818
Journal reference:	Computational Linguistics and Intellectual Technologies, Papers from the Annual International Conference "Dialogue" (2018) Issue 17, 14-27

Submission history

From: Ilya Gusev [view email]
[v1] Mon, 2 Jul 2018 13:04:52 UTC (411 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-07

Change to browse by:

cs
cs.AI
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Daniil Anastasyev
Ilya Gusev
Eugene Indenbom

export BibTeX citation

Computer Science > Computation and Language

Title:Improving part-of-speech tagging via multi-task learning and character-level word representations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving part-of-speech tagging via multi-task learning and character-level word representations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators