We introduce the music Ternary Modalities Dataset (MTM Dataset), which is created by our group to learn joint representations among music three modalities in music information retrieval (MIR), including three types of cross-modal retrieval. Learning joint representations for cross-modal retrieval among three modalities has been limited because of the limited availability of large dataset including three or more modalities. The goal of MTM Dataset collection is to overcome the constraints by extending music notes to sheet music and music audio, and build music-note and syllable fine grained alignment, such that the dataset can be used to learn joint representation across multimodal music data. The MTM Dataset provides three modalities: sheet music, lyrics and music audio and their feature extracted by pre-trained models. In this paper, we describe the dataset and how it was built, and evaluate some baselines for cross-modal retrieval tasks. The dataset and usage examples are available at https://github.com/MorningBooks/MTM-Dataset.
翻译:我们采用了音乐田间模式数据集(MTM Dataset),这是由我们小组创建的,目的是学习音乐信息检索(MIR)中三种音乐模式的联合代表,包括三种类型的跨模式检索;学习三种模式的跨模式检索联合代表有限,因为大型数据集有限,包括三种或三种以上模式;MTM Datas收集的目的是通过将音乐笔记扩大到音乐和音乐音频表,以及建立音乐笔记和音调的细微配对,克服制约因素,使数据集可用于学习多种音乐数据的联合代表。MTM Dataset提供了三种模式:单张音乐、歌词和音乐音频及其通过预先培训模式提取的特征。在本文件中,我们描述了数据集及其如何构建,并评估交叉模式检索任务的一些基线。数据集和使用实例见https://github.com/MNMTM-Dataset。