A non-causal FFTNet architecture for speech enhancement

Shifas, Muhammed PV; Adiga, Nagaraj; Tsiaras, Vassilis; Stylianou, Yannis

doi:10.21437/Interspeech.2019-2622

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2006.04469 (eess)

[Submitted on 8 Jun 2020]

Title:A non-causal FFTNet architecture for speech enhancement

Authors:Muhammed PV Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou

View PDF

Abstract:In this paper, we suggest a new parallel, non-causal and shallow waveform domain architecture for speech enhancement based on FFTNet, a neural network for generating high quality audio waveform. In contrast to other waveform based approaches like WaveNet, FFTNet uses an initial wide dilation pattern. Such an architecture better represents the long term correlated structure of speech in the time domain, where noise is usually highly non-correlated, and therefore it is suitable for waveform domain based speech enhancement. To further strengthen this feature of FFTNet, we suggest a non-causal FFTNet architecture, where the present sample in each layer is estimated from the past and future samples of the previous layer. By suggesting a shallow network and applying non-causality within certain limits, the suggested FFTNet for speech enhancement (SE-FFTNet) uses much fewer parameters compared to other neural network based approaches for speech enhancement like WaveNet and SEGAN. Specifically, the suggested network has considerably reduced model parameters: 32% fewer compared to WaveNet and 87% fewer compared to SEGAN. Finally, based on subjective and objective metrics, SE-FFTNet outperforms WaveNet in terms of enhanced signal quality, while it provides equally good performance as SEGAN. A Tensorflow implementation of the architecture is provided at 1 .

Comments:	5 pages
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2006.04469 [eess.AS]
	(or arXiv:2006.04469v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2006.04469
Related DOI:	https://doi.org/10.21437/Interspeech.2019-2622

Submission history

From: Muhammed Shifas Pv [view email]
[v1] Mon, 8 Jun 2020 10:49:04 UTC (269 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:A non-causal FFTNet architecture for speech enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:A non-causal FFTNet architecture for speech enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators