MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

Liu, Tianchi; Das, Rohan Kumar; Lee, Kong Aik; Li, Haizhou

Computer Science > Sound

arXiv:2202.01624v2 (cs)

[Submitted on 3 Feb 2022 (v1), revised 4 Feb 2022 (this version, v2), latest version 15 Feb 2022 (v3)]

Title:MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

Authors:Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

View PDF

Abstract:The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification. However, they require a large number of filters to capture the speaker characteristics at any local frequency region. In addition, the performance of such systems may degrade under short utterance scenarios. To address these issues, we propose a multi-scale frequency-channel attention (MFA), where we characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN. We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and computation complexity. Further, the MFA mechanism is found to be effective for speaker verification with short test utterances.

Comments:	Accepted by ICASSP 2022
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2202.01624 [cs.SD]
	(or arXiv:2202.01624v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2202.01624

Submission history

From: Tianchi Liu [view email]
[v1] Thu, 3 Feb 2022 14:57:05 UTC (2,039 KB)
[v2] Fri, 4 Feb 2022 15:39:24 UTC (2,037 KB)
[v3] Tue, 15 Feb 2022 17:09:04 UTC (1,022 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2022-02

Change to browse by:

cs
cs.CL
eess
eess.AS
eess.SP

References & Citations

DBLP - CS Bibliography

listing | bibtex

Rohan Kumar Das
Kong Aik Lee
Haizhou Li

export BibTeX citation

Computer Science > Sound

Title:MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators