Analysis of constant-Q filterbank based representations for speech emotion recognition

Singh, Premjeet; Waldekar, Shefali; Sahidullah, Md; Saha, Goutam

doi:10.1016/j.dsp.2022.103712

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2211.16363 (eess)

[Submitted on 29 Nov 2022]

Title:Analysis of constant-Q filterbank based representations for speech emotion recognition

Authors:Premjeet Singh, Shefali Waldekar, Md Sahidullah, Goutam Saha

View PDF

Abstract:This work analyzes the constant-Q filterbank-based time-frequency representations for speech emotion recognition (SER). Constant-Q filterbank provides non-linear spectro-temporal representation with higher frequency resolution at low frequencies. Our investigation reveals how the increased low-frequency resolution benefits SER. The time-domain comparative analysis between short-term mel-frequency spectral coefficients (MFSCs) and constant-Q filterbank-based features, namely constant-Q transform (CQT) and continuous wavelet transform (CWT), reveals that constant-Q representations provide higher time-invariance at low-frequencies. This provides increased robustness against emotion irrelevant temporal variations in pitch, especially for low-arousal emotions. The corresponding frequency-domain analysis over different emotion classes shows better resolution of pitch harmonics in constant-Q-based time-frequency representations than MFSC. These advantages of constant-Q representations are further consolidated by SER performance in the extensive evaluation of features over four publicly available databases with six advanced deep neural network architectures as the back-end classifiers. Our inferences in this study hint toward the suitability and potentiality of constant-Q features for SER.

Comments:	Accepted for publication in Elsevier's Digital Signal Processing Journal
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2211.16363 [eess.AS]
	(or arXiv:2211.16363v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2211.16363
Journal reference:	Volume 130, October 2022, 103712
Related DOI:	https://doi.org/10.1016/j.dsp.2022.103712

Submission history

From: Premjeet Singh [view email]
[v1] Tue, 29 Nov 2022 16:45:47 UTC (6,214 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Analysis of constant-Q filterbank based representations for speech emotion recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Analysis of constant-Q filterbank based representations for speech emotion recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators