This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.
翻译:本备忘录描述了NTR-TSU提交SIGTYP 2021 语言识别系统(LID)的SIGTYP 2021 语言识别共享任务。 口语识别系统(LID)是多语种自动语音识别系统(ASR)管道的重要一步。 对于许多低资源和濒危语言而言,只能提供单声录音,要求需要域名和语音变换语言识别系统。 在本备忘录中,我们显示一个具有自我强化共享层的神经神经网络显示了语言识别任务的可喜成果。