There has been a rapid growth of digitally available music data, including audio recordings, digitized images of sheet music, album covers and liner notes, and video clips. This huge amount of data calls for retrieval strategies that allow users to explore large music collections in a convenient way. More precisely, there is a need for cross-modal retrieval algorithms that, given a query in one modality (e.g., a short audio excerpt), find corresponding information and entities in other modalities (e.g., the name of the piece and the sheet music). This goes beyond exact audio identification and subsequent retrieval of metainformation as performed by commercial applications like Shazam [1].
翻译:数字音乐数据迅速增长,包括录音、床单音乐、专辑封面和衬里笔记的数字化图像、视频剪辑等数字音乐数据迅速增长,大量数据要求采用检索战略,使用户能够方便地探索大型音乐收藏,更确切地说,需要跨模式检索算法,根据一种方式的查询(例如,简短的音频摘录),以其他方式(例如,作品名称和单曲音乐)找到相应的信息和实体,这超出了沙扎姆(Shazam)([1])等商业应用程序的准确音频识别和随后检索元信息的范围。