While neural network models are making significant progress in piano transcription, they are becoming more resource-consuming due to requiring larger model size and more computing power. In this paper, we attempt to apply more prior about piano to reduce model size and improve the transcription performance. The sound of a piano note contains various overtones, and the pitch of a key does not change over time. To make full use of such latent information, we propose HPPNet that using the Harmonic Dilated Convolution to capture the harmonic structures and the Frequency Grouped Recurrent Neural Network to model the pitch-invariance over time. Experimental results on the MAESTRO dataset show that our piano transcription system achieves state-of-the-art performance both in frame and note scores (frame F1 93.15%, note F1 97.18%). Moreover, the model size is much smaller than the previous state-of-the-art deep learning models.
翻译:虽然神经网络模型在钢琴转录方面正在取得重大进展,但由于需要更大的模型尺寸和更多的计算能力,这些模型正变得越来越耗资资源。 在本文中,我们试图在更先应用钢琴来减少模型尺寸并改进转录性能。 钢琴音响包含不同的外观, 键的音调不会随时间而改变。 为了充分利用这些潜伏信息, 我们提议 HPPNet 使用调和解调变动来捕捉和谐结构, 并使用频率组合常规神经网络来模拟时空投影变异。 MAESTRO 数据集的实验结果显示, 我们的钢琴转录系统在框架和注分上都取得了最先进的性能( F113. 15%, 注F1 97.18% ) 。 此外, 模型大小远小于以前最先进的深层学习模型。