Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Beck, Gustavo Teodoro Döhler; Wennberg, Ulme; Malisz, Zofia; Henter, Gustav Eje

doi:10.1109/ICASSP43922.2022.9747442

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2202.10973 (eess)

[Submitted on 22 Feb 2022]

Title:Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Authors:Gustavo Teodoro Döhler Beck, Ulme Wennberg, Zofia Malisz, Gustav Eje Henter

View PDF

Abstract:Deep learning has revolutionised synthetic speech quality. However, it has thus far delivered little value to the speech science community. The new methods do not meet the controllability demands that practitioners in this area require e.g.: in listening tests with manipulated speech stimuli. Instead, control of different speech properties in such stimuli is achieved by using legacy signal-processing methods. This limits the range, accuracy, and speech quality of the manipulations. Also, audible artefacts have a negative impact on the methodological validity of results in speech perception studies.
This work introduces a system capable of manipulating speech properties through learning rather than design. The architecture learns to control arbitrary speech properties and leverages progress in neural vocoders to obtain realistic output. Experiments with copy synthesis and manipulation of a small set of core speech features (pitch, formants, and voice quality measures) illustrate the promise of the approach for producing speech stimuli that have accurate control and high perceptual quality.

Comments:	5 pages, 4 figures; to appear at ICASSP 2022
Subjects:	Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
MSC classes:	68T07
ACM classes:	I.2.7; I.2.6; J.5; H.5.5
Cite as:	arXiv:2202.10973 [eess.AS]
	(or arXiv:2202.10973v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2202.10973
Related DOI:	https://doi.org/10.1109/ICASSP43922.2022.9747442

Submission history

From: Gustav Eje Henter [view email]
[v1] Tue, 22 Feb 2022 15:26:26 UTC (399 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators