Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, linear and non-linear compression schemes can be applied to barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.
翻译:音乐结构分析(MSA) 包含将音乐片段分成几个不同的部分。 我们在一个压缩框架内处理管理事务协议, 假设通过简化歌曲原内容的表述更容易地揭示出结构结构。 更具体地说, 假设管理事务协议与在条形上的相似性相关, 线性和非线性压缩计划可以适用于有条纹的音频信号。 压缩的表达方式捕捉了歌曲中各条最突出的成分, 然后用动态编程算法推断歌曲结构。 这项工作探索了低级近似模型, 如主构件分析或非负式矩阵保理和“ 个人专用” 自动编码神经网络, 目的是学习特定歌曲的潜在表达方式。 这种方法并不依赖监督或说明, 众所周知, 监督或说明对于收集并可能模糊了管理事务协议描述。 在我们的实验中, 一些未经监督的压缩计划取得了类似于RWC-Pop数据集的状态监督方法( 3s容忍度) 和“ 个人专用” 自动编码网络, 显示磁盘压缩的重要性。