This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change. We propose to distinguish the concept of "performance" from the one of "interpretation", which expresses the "artistic intention". Towards assessing this distinction, we carried out an experimental evaluation where 91 subjects were invited to listen to various audio recordings created by resynthesizing MIDI data obtained through Automatic Music Transcription (AMT) systems and a sensorized acoustic piano. During the resynthesis, we simulated different contexts and asked listeners to evaluate how much the interpretation changes when the context changes. Results show that: (1) MIDI format alone is not able to completely grasp the artistic intention of a music performance; (2) usual objective evaluation measures based on MIDI data present low correlations with the average subjective evaluation. To bridge this gap, we propose a novel measure which is meaningfully correlated with the outcome of the tests. In addition, we investigate multimodal machine learning by providing a new score-informed AMT method and propose an approximation algorithm for the $p$-dispersion problem.
翻译:本研究侧重于当环境因素,如室内声学和仪器变化时对音乐表演的感知。我们建议区分“性能”的概念和表达“艺术意图”的“解释”的概念。为了评估这一区别,我们进行了实验性评价,请91个对象收听通过自动音乐传输系统和感知化音响钢琴系统获得的数据合成的MIDI数据所产生的各种录音。在重新合成时,我们模拟了不同的背景,请听众评估在环境变化时解释的变化程度。结果显示:(1)光靠MIDI格式无法完全掌握音乐表演的艺术意图;(2)基于MIDI数据的一般客观评价措施与平均主观评价关系不大。为了缩小这一差距,我们建议采取与测试结果有实际关联的新措施。此外,我们通过提供一种新的有分识的AMT方法来调查多式联运机学习情况,并为$-disperion问题提出近似算法。