End-to-end learning for music audio tagging at scale

Pons, Jordi; Nieto, Oriol; Prockup, Matthew; Schmidt, Erik; Ehmann, Andreas; Serra, Xavier

Computer Science > Sound

arXiv:1711.02520 (cs)

[Submitted on 7 Nov 2017 (v1), last revised 15 Jun 2018 (this version, v4)]

Title:End-to-end learning for music audio tagging at scale

Authors:Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra

View PDF

Abstract:The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study, 1.2M tracks annotated with musical labels are available to train our end-to-end models. This large amount of data allows us to unrestrictedly explore two different design paradigms for music auto-tagging: assumption-free models - using waveforms as input with very small convolutional filters; and models that rely on domain knowledge - log-mel spectrograms with a convolutional neural network designed to learn timbral and temporal features. Our work focuses on studying how these two types of deep architectures perform when datasets of variable size are available for training: the MagnaTagATune (25k songs), the Million Song Dataset (240k songs), and a private dataset of 1.2M songs. Our experiments suggest that music domain assumptions are relevant when not enough training data are available, thus showing how waveform-based models outperform spectrogram-based ones in large-scale data scenarios.

Comments:	Presented at the Workshop on Machine Learning for Audio Signal Processing (ML4Audio) at NIPS 2017, and in proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR2018). Code: this https URL. Demo: this http URL
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1711.02520 [cs.SD]
	(or arXiv:1711.02520v4 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1711.02520

Submission history

From: Jordi Pons M.Sc. [view email]
[v1] Tue, 7 Nov 2017 15:10:39 UTC (441 KB)
[v2] Wed, 15 Nov 2017 16:55:54 UTC (441 KB)
[v3] Thu, 16 Nov 2017 09:10:05 UTC (441 KB)
[v4] Fri, 15 Jun 2018 08:04:37 UTC (462 KB)

Computer Science > Sound

Title:End-to-end learning for music audio tagging at scale

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:End-to-end learning for music audio tagging at scale

Submission history

Access Paper:

References & Citations

2 blog links

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators