We propose a novel method to model hierarchical metrical structures for both symbolic music and audio signals in a self-supervised manner with minimal domain knowledge. The model trains and inferences on beat-aligned music signals and predicts an 8-layer hierarchical metrical tree from beat, measure to the section level. The training procedure does not require any hierarchical metrical labeling except for beats, purely relying on the nature of metrical regularity and inter-voice consistency as inductive biases. We show in experiments that the method achieves comparable performance with supervised baselines on multiple metrical structure analysis tasks on both symbolic music and audio signals. All demos, source code and pre-trained models are publicly available on GitHub.
翻译:我们提出一种新的方法,以自我监督的方式为象征性音乐和音频信号建模等级性衡量结构,同时提供最低限度的域知识; 模型对校正式音乐信号进行火车和推论,并预测从击打、测量到区级的8级等级性衡量树; 培训程序不要求任何等级性衡量标签,但节拍除外,纯粹依靠衡量规律和声际一致性的性质作为感性偏见; 我们在实验中显示,该方法取得了类似性能,与关于象征性音乐和音频信号的多度性结构分析任务的受监督基线相类似; GitHub 公开提供所有演示、源代码和预先培训的模式。