Understanding Audio Features via Trainable Basis Functions

Heung, Kwan Yee; Cheuk, Kin Wai; Herremans, Dorien

Abstract:In this paper we explore the possibility of maximizing the information represented in spectrograms by making the spectrogram basis functions trainable. We experiment with two different tasks, namely keyword spotting (KWS) and automatic speech recognition (ASR). For most neural network models, the architecture and hyperparameters are typically fine-tuned and optimized in experiments. Input features, however, are often treated as fixed. In the case of audio, signals can be mainly expressed in two main ways: raw waveforms (time-domain) or spectrograms (time-frequency-domain). In addition, different spectrogram types are often used and tailored to fit different applications. In our experiments, we allow for this tailoring directly as part of the network.
Our experimental results show that using trainable basis functions can boost the accuracy of Keyword Spotting (KWS) by 14.2 percentage points, and lower the Phone Error Rate (PER) by 9.5 percentage points. Although models using trainable basis functions become less effective as the model complexity increases, the trained filter shapes could still provide us with insights on which frequency bins are important for that specific task. From our experiments, we can conclude that trainable basis functions are a useful tool to boost the performance when the model complexity is limited.

Comments:	under review in Interspeech 2022
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2204.11437 [cs.SD]
	(or arXiv:2204.11437v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2204.11437

🚨2024-09-29: arxiv.org is experience DB issues. The announce tonight will be 3 hours later than usual.🚨

Computer Science > Sound

Title:Understanding Audio Features via Trainable Basis Functions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators