MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

Karamatlı, Ertuğ; Kırbız, Serap

doi:10.1109/LSP.2022.3232276

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2202.03875 (eess)

[Submitted on 8 Feb 2022 (v1), last revised 6 Jan 2023 (this version, v2)]

Title:MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

Authors:Ertuğ Karamatlı, Serap Kırbız

View PDF

Abstract:We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).

Comments:	5 pages. This article has been accepted for publication in IEEE Signal Processing Letters
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:2202.03875 [eess.AS]
	(or arXiv:2202.03875v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2202.03875
Journal reference:	IEEE Signal Processing Letters 29 (2022) 2637-2641
Related DOI:	https://doi.org/10.1109/LSP.2022.3232276

Submission history

From: Ertuğ Karamatlı [view email]
[v1] Tue, 8 Feb 2022 14:02:50 UTC (143 KB)
[v2] Fri, 6 Jan 2023 21:15:02 UTC (89 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators