This paper presents a comparative analysis on two artificial neural networks (with different architectures) for the task of tempo estimation. For this purpose, it also proposes the modeling, training and evaluation of a B-RNN (Bidirectional Recurrent Neural Network) model capable of estimating tempo in bpm (beats per minutes) of musical pieces, without using external auxiliary modules. An extensive database (12,550 pieces in total) was curated to conduct a quantitative and qualitative analysis over the experiment. Percussion-only tracks were also included in the dataset. The performance of the B-RNN is compared to that of state-of-the-art models. For further comparison, a state-of-the-art CNN was also retrained with the same datasets used for the B-RNN training. Evaluation results for each model and datasets are presented and discussed, as well as observations and ideas for future research. Tempo estimation was more accurate for the percussion only dataset, suggesting that the estimation can be more accurate for percussion-only tracks, although further experiments (with more of such datasets) should be made to gather stronger evidence.
翻译:本文对两种人造神经网络(有不同的结构)进行了比较分析,以完成节奏估计任务。为此,本文件还提议对B-RNN(双向经常性神经网络)模型进行建模、培训和评估,该模型能够在不使用外部辅助模块的情况下,在音乐片的bpm(每分钟弹拍)中估计节奏,不使用外部辅助模块。一个广泛的数据库(共12 550个),以便对实验进行定量和定性分析。数据集中还包括了只冲击轨道。B-RNNN的性能与最新模型的性能进行了比较。为了进一步比较,一个最先进的CNN还接受了与B-RNN培训相同的数据集的再培训。介绍和讨论了每种模型和数据集的评价结果,以及未来研究的观察和设想。Temo估计对冲击仅包含数据集的估算更为准确,表明仅冲击轨道的估算可以更精确,不过进一步实验(包括更多的数据组)以收集更有力的证据。