Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Udupa, Sathvik; C, Siddarth; Ghosh, Prasanta Kumar

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2210.16871 (eess)

[Submitted on 30 Oct 2022]

Title:Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Authors:Sathvik Udupa, Siddarth C, Prasanta Kumar Ghosh

View PDF

Abstract:In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion classification, etc., we experiment with its efficacy for AAI. We train on SSL features with transformer neural networks-based AAI models of 3 different model complexities and compare its performance with MFCCs in subject-specific (SS), pooled and fine-tuned (FT) configurations with data from 10 subjects, and evaluate with correlation coefficient (CC) score on the unseen sentence test set. We find that acoustic feature reconstruction objective-based SSL features such as TERA and DeCoAR work well for AAI, with SS CCs of these SSL features reaching close to the best FT CCs of MFCC. We also find the results consistent across different model sizes.

Comments:	submitted to ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2210.16871 [eess.AS]
	(or arXiv:2210.16871v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2210.16871

Submission history

From: Sathvik Udupa [view email]
[v1] Sun, 30 Oct 2022 16:24:02 UTC (186 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators