Motivated by the state-of-art psychological research, we note that a piano performance transcribed with existing Automatic Music Transcription (AMT) methods cannot be successfully resynthesized without affecting the artistic content of the performance. This is due to 1) the different mappings between MIDI parameters used by different instruments, and 2) the fact that musicians adapt their way of playing to the surrounding acoustic environment. To face this issue, we propose a methodology to build acoustics-specific AMT systems that are able to model the adaptations that musicians apply to convey their interpretation. Specifically, we train models tailored for virtual instruments in a modular architecture that takes as input an audio recording and the relative aligned music score, and outputs the acoustics-specific velocities of each note. We test different model shapes and show that the proposed methodology generally outperforms the usual AMT pipeline which does not consider specificities of the instrument and of the acoustic environment. Interestingly, such a methodology is extensible in a straightforward way since only slight efforts are required to train models for the inference of other piano parameters, such as pedaling.
翻译:在最先进的心理研究的推动下,我们注意到,在不影响表演艺术内容的情况下,以现有自动音乐分解(AMT)方法制作的钢琴表演无法成功地重新合成,其原因是:1)不同仪器使用的MIDI参数之间的不同绘图,2)音乐家根据周围的声学环境调整了游戏方式。面对这一问题,我们提议了一种方法,以建立能够模拟音乐家为表达其解释而应用的适应的声学专用AMT系统。具体地说,我们培训一个模块结构中虚拟仪器的模型,该模块结构将音频记录和相对一致的音乐评分作为投入,并输出每个音频特定的速率。我们测试了不同的模型形状,并表明拟议方法总体上优于通常的AMT管道,而该管道并不考虑仪器和声学环境的特殊性。有趣的是,这种方法可以直截了当地推广,因为只需要稍作努力就其他钢琴参数的推导模型,例如平调。