While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task. For many low-resource and endangered languages this is in part due to resource availability: where larger datasets exist, they may be single-speaker or have different domains than desired application scenarios, demanding a need for domain and speaker-invariant language identification systems. This year's shared task on robust spoken language identification sought to investigate just this scenario: systems were to be trained on largely single-speaker speech from one domain, but evaluated on data in other domains recorded from speakers under different recording circumstances, mimicking realistic low-resource scenarios. We see that domain and speaker mismatch proves very challenging for current methods which can perform above 95% accuracy in-domain, which domain adaptation can address to some degree, but that these conditions merit further investigation to make spoken language identification accessible in many scenarios.
翻译:虽然语言识别是一项基本的语言和语言处理任务,但对于许多语言和语言家庭来说,这仍是一项艰巨的任务,对于许多低资源和濒危语言来说,这在一定程度上是由于资源可用性造成的:如果存在较大的数据集,它们可能是单声道或与理想应用情景不同的领域,要求需要域名和语种不同语言识别系统。今年关于稳健的口语识别的共同任务试图仅仅调查这一情景:各系统将接受关于一个领域的基本上单声道语言的培训,但根据发言者在不同记录情况下记录的其他领域的数据进行评估,模仿现实的低资源情景。我们看到,对于目前能够达到95%以上的领域内部准确度的方法来说,域内调可以在一定程度上解决,但对于这些条件值得进一步调查,以便在许多情景下可以使用口语识别。