Attention as a Perspective for Learning Tempo-invariant Audio Queries

Dorfer, Matthias; Hajič Jr., Jan; Widmer, Gerhard

Computer Science > Sound

arXiv:1809.05689 (cs)

[Submitted on 15 Sep 2018]

Title:Attention as a Perspective for Learning Tempo-invariant Audio Queries

Authors:Matthias Dorfer, Jan Hajič Jr., Gerhard Widmer

View PDF

Abstract:Current models for audio--sheet music retrieval via multimodal embedding space learning use convolutional neural networks with a fixed-size window for the input audio. Depending on the tempo of a query performance, this window captures more or less musical content, while notehead density in the score is largely tempo-independent. In this work we address this disparity with a soft attention mechanism, which allows the model to encode only those parts of an audio excerpt that are most relevant with respect to efficient query codes. Empirical results on classical piano music indicate that attention is beneficial for retrieval performance, and exhibits intuitively appealing behavior.

Comments:	The 2018 Joint Workshop on Machine Learning for Music
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1809.05689 [cs.SD]
	(or arXiv:1809.05689v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1809.05689

Submission history

From: Matthias Dorfer [view email]
[v1] Sat, 15 Sep 2018 10:03:15 UTC (1,498 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2018-09

Change to browse by:

cs
cs.LG
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Matthias Dorfer
Jan Hajic Jr.
Gerhard Widmer

export BibTeX citation

Computer Science > Sound

Title:Attention as a Perspective for Learning Tempo-invariant Audio Queries

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Attention as a Perspective for Learning Tempo-invariant Audio Queries

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators