Transformer based Grapheme-to-Phoneme Conversion

Yolchuyeva, Sevinj; Németh, Géza; Gyires-Tóth, Bálint

doi:10.21437/Interspeech.2019-1954

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2004.06338 (eess)

[Submitted on 14 Apr 2020 (v1), last revised 26 Jun 2020 (this version, v2)]

Title:Transformer based Grapheme-to-Phoneme Conversion

Authors:Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

View PDF

Abstract:Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets. Furthermore, the size of the proposed model is much smaller than the size of the previous approaches.

Comments:	INTERSPEECH 2019
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2004.06338 [eess.AS]
	(or arXiv:2004.06338v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2004.06338
Related DOI:	https://doi.org/10.21437/Interspeech.2019-1954

Submission history

From: Sevinj Yolchuyeva [view email]
[v1] Tue, 14 Apr 2020 07:48:15 UTC (164 KB)
[v2] Fri, 26 Jun 2020 21:09:53 UTC (340 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Transformer based Grapheme-to-Phoneme Conversion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Transformer based Grapheme-to-Phoneme Conversion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators