使用发言者表彰方法学习来自原波形的乐器音响的缩写 (Use of speaker recognition approaches for learning timbre representations of musical instrument sounds from raw waveforms)

Timbre representations of musical instruments, essential for diverse applications such as musical audio synthesis and separation, might be learned as bottleneck features from an instrumental recognition model. Given the similarities between speaker recognition and musical instrument recognition, in this paper, we investigate how to adapt successful speaker recognition algorithms to musical instrument recognition to learn meaningful instrumental timbre representations. To address the mismatch between musical audio and models devised for speech, we introduce a group of trainable filters to generate proper acoustic features from input raw waveforms, making it easier for a model to be optimized in an input-agnostic and end-to-end manner. Through experiments on both the NSynth and RWC databases in both musical instrument closed-set identification and open-set verification scenarios, the modified speaker recognition model was capable of generating discriminative embeddings for instrument and instrument-family identities. We further conducted extensive experiments to characterize the encoded information in learned timbre embeddings.

翻译：音乐乐器是音乐音频合成和分离等多种应用所必不可少的,因此,可以作为瓶颈特征从一种工具的识别模型中学习。鉴于语音识别和乐器识别之间的相似性,我们在本文件中调查如何使成功的语音识别算法适应乐器识别,以学习有意义的乐器识别;为解决音乐音频和为演讲设计的模型之间的不匹配问题,我们引入一组可训练过滤器,从输入的原始波形中产生适当的声学特征,使一种模型更容易以输入-认知和终端至终端的方式优化。通过对NSynth和RWC数据库的实验,在乐器封闭式识别和开放式核查情景中,经过修改的语音识别模型能够产生对乐器和乐器家庭特性的歧视性嵌入。我们进一步进行了广泛的实验,将编码信息定性为有知识的丁膜嵌入式嵌入式。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【如何做研究】How to research ，22页ppt

专知会员服务

113+阅读 · 2021年4月17日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

迁移学习简明教程，11页ppt

专知会员服务

109+阅读 · 2020年8月4日