Meta-learning for robust child-adult classification from speech

Koluguri, Nithin Rao; Kumar, Manoj; Kim, So Hyun; Lord, Catherine; Narayanan, Shrikanth

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1910.11400 (eess)

[Submitted on 24 Oct 2019 (v1), last revised 28 Oct 2019 (this version, v2)]

Title:Meta-learning for robust child-adult classification from speech

Authors:Nithin Rao Koluguri, Manoj Kumar, So Hyun Kim, Catherine Lord, Shrikanth Narayanan

View PDF

Abstract:Computational modeling of naturalistic conversations in clinical applications has seen growing interest in the past decade. An important use-case involves child-adult interactions within the autism diagnosis and intervention domain. In this paper, we address a specific sub-problem of speaker diarization, namely child-adult speaker classification in such dyadic conversations with specified roles. Training a speaker classification system robust to speaker and channel conditions is challenging due to inherent variability in the speech within children and the adult interlocutors. In this work, we propose the use of meta-learning, in particular, prototypical networks which optimize a metric space across multiple tasks. By modeling every child-adult pair in the training set as a separate task during meta-training, we learn a representation with improved generalizability compared to conventional supervised learning. We demonstrate improvements over state-of-the-art speaker embeddings (x-vectors) under two evaluation settings: weakly supervised classification (up to 14.53% relative improvement in F1-scores) and clustering (up to relative 9.66% improvement in cluster purity). Our results show that protonets can potentially extract robust speaker embeddings for child-adult classification from speech.

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:1910.11400 [eess.AS]
	(or arXiv:1910.11400v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1910.11400

Submission history

From: Nithin Rao Koluguri [view email]
[v1] Thu, 24 Oct 2019 19:55:24 UTC (1,010 KB)
[v2] Mon, 28 Oct 2019 20:43:53 UTC (934 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Meta-learning for robust child-adult classification from speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Meta-learning for robust child-adult classification from speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators