Experiments to understand the sensorimotor neural interactions in the human cortical speech system support the existence of a bidirectional flow of interactions between the auditory and motor regions. Their key function is to enable the brain to 'learn' how to control the vocal tract for speech production. This idea is the impetus for the recently proposed "MirrorNet", a constrained autoencoder architecture. In this paper, the MirrorNet is applied to learn, in an unsupervised manner, the controls of a specific audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. The results demonstrate how the MirrorNet discovers the synthesizer parameters to generate the melodies that closely resemble the original and those of unseen melodies, and even determine the best set parameters to approximate renditions of complex piano melodies generated by a different synthesizer. This generalizability of the MirrorNet illustrates its potential to discover from sensory data the controls of arbitrary motor-plants such as autonomous vehicles.
翻译:用于理解人类文体语音系统中感官模擬神经互动的实验支持了听觉和运动区域之间双向互动的双向流动。 它们的关键功能是让大脑能够“ 学习” 如何控制语音制作的声带。 这个想法是最近提出的“ MirrorNet” 的动力, 一个受限制的自动读数结构。 在本文中, MirrorNet 被应用来以不受监督的方式学习特定音频合成器(DIVA) 的控制, 以便只从其听觉光谱中生成旋律。 结果表明, MirrearNet 是如何发现合成器参数来生成与原始和无形旋律非常相似的旋律, 甚至确定最精确的组合参数, 以近似由不同合成器生成的复杂钢琴旋律的变形体。 MirrorNet 的这种通用性说明它有可能从感官数据中发现任意运动器( 如自主汽车)的控制。