Integrating Uncertainty into Neural Network-based Speech Enhancement

Fang, Huajian; Becker, Dennis; Wermter, Stefan; Gerkmann, Timo

doi:10.1109/TASLP.2023.3265202

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2305.08744 (eess)

[Submitted on 15 May 2023]

Title:Integrating Uncertainty into Neural Network-based Speech Enhancement

Authors:Huajian Fang, Dennis Becker, Stefan Wermter, Timo Gerkmann

View PDF

Abstract:Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former refers to inherent randomness in data, while the latter describes uncertainty in the model parameters. In this work, we propose a framework to jointly model aleatoric and epistemic uncertainties in neural network-based speech enhancement. The proposed approach captures aleatoric uncertainty by estimating the statistical moments of the speech posterior distribution and explicitly incorporates the uncertainty estimate to further improve clean speech estimation. For epistemic uncertainty, we investigate two Bayesian deep learning approaches: Monte Carlo dropout and Deep ensembles to quantify the uncertainty of the neural network parameters. Our analyses show that the proposed framework promotes capturing practical and reliable uncertainty, while combining different sources of uncertainties yields more reliable predictive uncertainty estimates. Furthermore, we demonstrate the benefits of modeling uncertainty on speech enhancement performance by evaluating the framework on different datasets, exhibiting notable improvement over comparable models that fail to account for uncertainty.

Comments:	Accepted version
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2305.08744 [eess.AS]
	(or arXiv:2305.08744v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2305.08744
Journal reference:	IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1587-1600, 2023
Related DOI:	https://doi.org/10.1109/TASLP.2023.3265202

Submission history

From: Huajian Fang [view email]
[v1] Mon, 15 May 2023 15:55:12 UTC (1,281 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Integrating Uncertainty into Neural Network-based Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Integrating Uncertainty into Neural Network-based Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators