We explore the use of a neural network inspired by predictive coding for modeling human music perception. This network was developed based on the computational neuroscience theory of recurrent interactions in the hierarchical visual cortex. When trained with video data using self-supervised learning, the model manifests behaviors consistent with human visual illusions. Here, we adapt this network to model the hierarchical auditory system and investigate whether it will make similar choices to humans regarding the musicality of a set of random pitch sequences. When the model is trained with a large corpus of instrumental classical music and popular melodies rendered as mel spectrograms, it exhibits greater prediction errors for random pitch sequences that are rated less musical by human subjects. We found that the prediction error depends on the amount of information regarding the subsequent note, the pitch interval, and the temporal context. Our findings suggest that predictability is correlated with human perception of musicality and that a predictive coding neural network trained on music can be used to characterize the features and motifs contributing to human perception of music.
翻译:我们探索使用由预测编码启发的神经网络来模拟人类音乐感知。 这个网络是建立在高级视觉皮层中反复互动的计算神经科学理论基础上开发的。 当通过自我监督的学习进行视频数据培训时, 模型显示符合人类视觉幻觉的行为。 在这里, 我们调整这个网络来模拟等级听觉系统, 并研究它是否会在一组随机声波序列的音乐性方面与人类作出类似的选择。 当模型经过大量以光谱形式制作的辅助古典音乐和流行旋律的培训, 它显示随机音频序列的预测错误更大, 而随机音频序列被人类主体评为较少音乐。 我们发现, 预测错误取决于关于随后音调、 音频间隔和时间背景的信息量。 我们的发现表明,可预测性与人类对音乐感知的认知相关, 并且一个经过音乐培训的预测编码神经网络可以用来描述有助于人类对音乐感知的特征和模型。