MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Huang, Yu-Fen; Moran, Nikki; Coleman, Simon; Kelly, Jon; Wei, Shun-Hwa; Chen, Po-Yin; Huang, Yun-Hsin; Chen, Tsung-Ping; Kuo, Yu-Chia; Wei, Yu-Chi; Li, Chih-Hsuan; Huang, Da-Yu; Kao, Hsuan-Kai; Lin, Ting-Wei; Su, Li

doi:10.1109/TASLP.2024.3407529

Computer Science > Sound

arXiv:2406.06375 (cs)

[Submitted on 10 Jun 2024]

Title:MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Authors:Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-Ping Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

View PDF HTML (experimental)

Abstract:In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (this https URL).

Comments:	IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: this https URL and this https URL
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2406.06375 [cs.SD]
	(or arXiv:2406.06375v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2406.06375
Related DOI:	https://doi.org/10.1109/TASLP.2024.3407529

Submission history

From: Yu-Fen Huang [view email]
[v1] Mon, 10 Jun 2024 15:37:46 UTC (3,503 KB)

Computer Science > Sound

Title:MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators