We propose a system for rendering a symbolic piano performance with flexible musical expression. It is necessary to actively control musical expression for creating a new music performance that conveys various emotions or nuances. However, previous approaches were limited to following the composer's guidelines of musical expression or dealing with only a part of the musical attributes. We aim to disentangle the entire musical expression and structural attribute of piano performance using a conditional VAE framework. It stochastically generates expressive parameters from latent representations and given note structures. In addition, we employ self-supervised approaches that force the latent variables to represent target attributes. Finally, we leverage a two-step encoder and decoder that learn hierarchical dependency to enhance the naturalness of the output. Experimental results show that our system can stably generate performance parameters relevant to the given musical scores, learn disentangled representations, and control musical attributes independently of each other.
翻译:我们建议一个以灵活的音乐表达方式制作钢琴象征性表演的系统。 有必要积极控制音乐表达方式, 以创造新的音乐表现方式, 传达各种情感或细微差别。 但是, 以前的方法仅限于遵循作曲家的音乐表达方式指南, 或只处理音乐属性的一部分。 我们的目标是使用一个有条件的 VAE 框架将钢琴表演的整个音乐表达方式和结构属性分离开来。 它从潜伏的表达和给定的笔记结构中产生表达的参数。 此外, 我们使用自我监督的方法, 迫使潜在变量代表目标属性。 最后, 我们利用一个两步的编码器和解码器, 学习等级依赖性来增强输出的自然性。 实验结果显示, 我们的系统可以不折不扣地产生与给定的音乐分数相关的性参数, 学习分解的表达方式, 并独立控制音乐属性 。