Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Rosati, Domenic

Computer Science > Computation and Language

arXiv:2202.00794 (cs)

[Submitted on 29 Jan 2022 (v1), last revised 9 Feb 2022 (this version, v2)]

Title:Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Authors:Domenic Rosati

View PDF

Abstract:Machine learning models allow us to compare languages by showing how hard a task in each language might be to learn and perform well on. Following this line of investigation, we explore what makes a language "hard to pronounce" by modelling the task of grapheme-to-phoneme (g2p) transliteration. By training a character-level transformer model on this task across 22 languages and measuring the model's proficiency against its grapheme and phoneme inventories, we show that certain characteristics emerge that separate easier and harder languages with respect to learning to pronounce. Namely the complexity of a language's pronunciation from its orthography is due to the expressive or simplicity of its grapheme-to-phoneme mapping. Further discussion illustrates how future studies should consider relative data sparsity per language to design fairer cross-lingual comparison tasks.

Comments:	Submitted and Accepted to NLPML 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2202.00794 [cs.CL]
	(or arXiv:2202.00794v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2202.00794

Submission history

From: Domenic Rosati [view email]
[v1] Sat, 29 Jan 2022 14:44:39 UTC (449 KB)
[v2] Wed, 9 Feb 2022 19:10:44 UTC (457 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2022-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

export BibTeX citation

Computer Science > Computation and Language

Title:Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Learning to pronounce as measuring cross-lingual joint orthography-phonology complexity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators