Master Thesis: Neural Sign Language Translation by Learning Tokenization

Orbay, Alptekin

Abstract:In this thesis, we propose a multitask learning based method to improve Neural Sign Language Translation (NSLT) consisting of two parts, a tokenization layer and Neural Machine Translation (NMT). The tokenization part focuses on how Sign Language (SL) videos should be represented to be fed into the other part. It has not been studied elaborately whereas NMT research has attracted several researchers contributing enormous advancements. Up to now, there are two main input tokenization levels, namely frame-level and gloss-level tokenization. Glosses are world-like intermediate presentation and unique to SLs. Therefore, we aim to develop a generic sign-level tokenization layer so that it is applicable to other domains without further effort. We begin with investigating current tokenization approaches and explain their weaknesses with several experiments. To provide a solution, we adapt Transfer Learning, Multitask Learning and Unsupervised Domain Adaptation into this research to leverage additional supervision. We succeed in enabling knowledge transfer between SLs and improve translation quality by 5 points in BLEU-4 and 8 points in ROUGE scores. Secondly, we show the effects of body parts by extensive experiments in all the tokenization approaches. Apart from these, we adopt 3D-CNNs to improve efficiency in terms of time and space. Lastly, we discuss the advantages of sign-level tokenization over gloss-level tokenization. To sum up, our proposed method eliminates the need for gloss level annotation to obtain higher scores by providing additional supervision by utilizing weak supervision sources.

Comments:	MS Thesis Extension of the Paper with the same name
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2011.09289 [cs.CL]
	(or arXiv:2011.09289v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2011.09289

Computer Science > Computation and Language

Title:Master Thesis: Neural Sign Language Translation by Learning Tokenization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators