Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

Ko, Yuka; Fukuda, Ryo; Nishikawa, Yuta; Kano, Yasumasa; Sudoh, Katsuhito; Nakamura, Satoshi

Computer Science > Computation and Language

arXiv:2306.08582 (cs)

[Submitted on 14 Jun 2023]

Title:Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

Authors:Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

View PDF

Abstract:Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the size of such SI data is limited, so the SI data should be used together with ordinary bilingual data whose translations are given in offline. In this paper, we propose an effective way to train a SimulST model using mixed data of SI and offline. The proposed method trains a single model using the mixed data with style tags that tell the model to generate SI- or offline-style outputs. Experiment results show improvements of BLEURT in different latency ranges, and our analyses revealed the proposed model generates SI-style outputs more than the baseline.

Comments:	Accepted to IWSLT2023 scientific paper
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2306.08582 [cs.CL]
	(or arXiv:2306.08582v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.08582

Submission history

From: Yuka Ko [view email]
[v1] Wed, 14 Jun 2023 15:42:06 UTC (7,339 KB)

Computer Science > Computation and Language

Title:Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators