Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

Nakashima, Ryo; Ozaki, Ryo; Taniguchi, Tadahiro

doi:10.3389/frobt.2019.00092

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1906.11049 (eess)

[Submitted on 21 Jun 2019]

Title:Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

Authors:Ryo Nakashima, Ryo Ozaki, Tadahiro Taniguchi

View PDF

Abstract:This paper describes a new unsupervised machine learning method for simultaneous phoneme and word discovery from multiple speakers. Human infants can acquire knowledge of phonemes and words from interactions with his/her mother as well as with others surrounding him/her. From a computational perspective, phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker because the speech signals from different speakers exhibit different acoustic features. This paper proposes an unsupervised phoneme and word discovery method that simultaneously uses nonparametric Bayesian double articulation analyzer (NPB-DAA) and deep sparse autoencoder with parametric bias in hidden layer (DSAE-PBHL). We assume that an infant can recognize and distinguish speakers based on certain other features, e.g., visual face recognition. DSAE-PBHL is aimed to be able to subtract speaker-dependent acoustic features and extract speaker-independent features. An experiment demonstrated that DSAE-PBHL can subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than on the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PB outperformed the available methods in phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers.

Comments:	21 pages. Submitted
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:1906.11049 [eess.AS]
	(or arXiv:1906.11049v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1906.11049
Journal reference:	Front. Robot. AI, 2019, 6:92
Related DOI:	https://doi.org/10.3389/frobt.2019.00092

Submission history

From: Tadahiro Taniguchi [view email]
[v1] Fri, 21 Jun 2019 02:24:55 UTC (2,364 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators