Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.
翻译:音乐结构分析(MSA) 包含将音乐片段分成几个不同的部分。 我们在一个压缩框架内处理管理事务协议, 假设通过简化歌曲原内容的表述更容易地揭示出该结构。 更具体地说, 假设管理事务协议与酒吧比例表上出现的相似之处相关, 本条采用线性和非线性压缩计划对条状音频信号使用线性和非线性压缩计划。 压缩的表达方式捕捉了歌曲中不同条形中最突出的成分, 然后用动态编程算法来推断歌曲结构。 这项工作探索了低级近似模型, 如主构分析或非负式矩阵保理和“ 个人专用” 自动编码神经网络, 目的是学习特定歌曲的隐含性表述。 这种方法并不依赖监督或说明, 众所周知, 监督或说明对于收集并可能模糊了管理事务协议的描述。 在我们的实验中, 几个未受到监督的压缩计划取得了类似于RWC-PROP数据集的状态监督方法( 3s容忍度) 的性水平。