Direct Speech-to-Speech Neural Machine Translation: A Survey

Gupta, Mahendra; Dutta, Maitreyee; Maurya, Chandresh Kumar

Computer Science > Computation and Language

arXiv:2411.14453 (cs)

[Submitted on 13 Nov 2024]

Title:Direct Speech-to-Speech Neural Machine Translation: A Survey

Authors:Mahendra Gupta, Maitreyee Dutta, Chandresh Kumar Maurya

View PDF HTML (experimental)

Abstract:Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation, have better decoding latency, and the ability to preserve paralinguistic and non-linguistic features. However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance, especially in real-world translation. To the best of our knowledge, no comprehensive survey is available on the direct S2ST system, which beginners and advanced researchers can look upon for a quick survey. The present work provides a comprehensive review of direct S2ST models, data and application issues, and performance metrics. We critically analyze the models' performance over the benchmark datasets and provide research challenges and future directions.

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2411.14453 [cs.CL]
	(or arXiv:2411.14453v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.14453

Submission history

From: Mahendra Gupta [view email]
[v1] Wed, 13 Nov 2024 13:01:21 UTC (371 KB)

Computer Science > Computation and Language

Title:Direct Speech-to-Speech Neural Machine Translation: A Survey

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Direct Speech-to-Speech Neural Machine Translation: A Survey

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators