Extensive works have tackled Language Identification (LID) in the speech domain, however their application to the singing voice trails and performances on Singing Language Identification (SLID) can be improved leveraging recent progresses made in other singing related tasks. This work presents a modernized phonotactic system for SLID on polyphonic music: phoneme recognition is performed with a Connectionist Temporal Classification (CTC)-based acoustic model trained with multilingual data, before language classification with a recurrent model based on the phonemes estimation. The full pipeline is trained and evaluated with a large and publicly available dataset, with unprecedented performances. First results of SLID with out-of-set languages are also presented.
翻译:大量工作涉及语言领域的语言识别(LID)问题,然而,借助其他与歌唱相关的工作的最新进展,这些作品在歌唱语言识别(SLID)的歌唱声踪迹和表演中的应用可以得到改善,这项工作为SLID的多声音乐提供了现代化的光学识别系统:电话识别采用基于连接时间分类(CTC)的声学模型进行,该模型经过多语种数据培训,然后采用基于语音估计的经常性模式进行语言分类;对全部管道进行了培训和评价,并配有大量公开可查取的数据集,以及前所未有的表演;还介绍了具有超常规语言的SLID的首次结果。