On Data Sampling Strategies for Training Neural Network Speech Separation Models

Ravenscroft, William; Goetze, Stefan; Hain, Thomas

Computer Science > Sound

arXiv:2304.07142 (cs)

[Submitted on 14 Apr 2023 (v1), last revised 16 Jun 2023 (this version, v2)]

Title:On Data Sampling Strategies for Training Neural Network Speech Separation Models

Authors:William Ravenscroft, Stefan Goetze, Thomas Hain

View PDF

Abstract:Speech separation remains an important area of multi-speaker signal processing. Deep neural network (DNN) models have attained the best performance on many speech separation benchmarks. Some of these models can take significant time to train and have high memory requirements. Previous work has proposed shortening training examples to address these issues but the impact of this on model performance is not yet well understood. In this work, the impact of applying these training signal length (TSL) limits is analysed for two speech separation models: SepFormer, a transformer model, and Conv-TasNet, a convolutional model. The WJS0-2Mix, WHAMR and Libri2Mix datasets are analysed in terms of signal length distribution and its impact on training efficiency. It is demonstrated that, for specific distributions, applying specific TSL limits results in better performance. This is shown to be mainly due to randomly sampling the start index of the waveforms resulting in more unique examples for training. A SepFormer model trained using a TSL limit of 4.42s and dynamic mixing (DM) is shown to match the best-performing SepFormer model trained with DM and unlimited signal lengths. Furthermore, the 4.42s TSL limit results in a 44% reduction in training time with WHAMR.

Comments:	Accepted for EUSIPCO 2023
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2304.07142 [cs.SD]
	(or arXiv:2304.07142v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2304.07142

Submission history

From: William Ravenscroft [view email]
[v1] Fri, 14 Apr 2023 14:05:52 UTC (1,815 KB)
[v2] Fri, 16 Jun 2023 13:42:10 UTC (1,815 KB)

Computer Science > Sound

Title:On Data Sampling Strategies for Training Neural Network Speech Separation Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:On Data Sampling Strategies for Training Neural Network Speech Separation Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators