Can Impressions of Music be Extracted from Thumbnail Images?

Harada, Takashi; Motomitsu, Takehiro; Hayashi, Katsuhiko; Sakai, Yusuke; Kamigaito, Hidetaka

Computer Science > Computation and Language

arXiv:2501.02511 (cs)

[Submitted on 5 Jan 2025]

Title:Can Impressions of Music be Extracted from Thumbnail Images?

Authors:Takashi Harada, Takehiro Motomitsu, Katsuhiko Hayashi, Yusuke Sakai, Hidetaka Kamigaito

View PDF HTML (experimental)

Abstract:In recent years, there has been a notable increase in research on machine learning models for music retrieval and generation systems that are capable of taking natural language sentences as inputs. However, there is a scarcity of large-scale publicly available datasets, consisting of music data and their corresponding natural language descriptions known as music captions. In particular, non-musical information such as suitable situations for listening to a track and the emotions elicited upon listening is crucial for describing music. This type of information is underrepresented in existing music caption datasets due to the challenges associated with extracting it directly from music data. To address this issue, we propose a method for generating music caption data that incorporates non-musical aspects inferred from music thumbnail images, and validated the effectiveness of our approach through human evaluations. Additionally, we created a dataset with approximately 360,000 captions containing non-musical aspects. Leveraging this dataset, we trained a music retrieval model and demonstrated its effectiveness in music retrieval tasks through evaluation.

Comments:	Accepted at NLP4MusA 2024
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2501.02511 [cs.CL]
	(or arXiv:2501.02511v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.02511

Submission history

From: Katauhiko Hayashi [view email]
[v1] Sun, 5 Jan 2025 11:51:38 UTC (548 KB)

Computer Science > Computation and Language

Title:Can Impressions of Music be Extracted from Thumbnail Images?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Can Impressions of Music be Extracted from Thumbnail Images?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators