When documenting oral-languages, Unsupervised Word Segmentation (UWS) from speech is a useful, yet challenging, task. It can be performed from phonetic transcriptions, or in the absence of these, from the output of unsupervised speech discretization models. These discretization models are trained using raw speech only, producing discrete speech units which can be applied for downstream (text-based) tasks. In this paper we compare five of these models: three Bayesian and two neural approaches, with regards to the exploitability of the produced units for UWS. Two UWS models are experimented with and we report results for Finnish, Hungarian, Mboshi, Romanian and Russian in a low-resource setting (using only 5k sentences). Our results suggest that neural models for speech discretization are difficult to exploit in our setting, and that it might be necessary to adapt them to limit sequence length. We obtain our best UWS results by using the SHMM and H-SHMM Bayesian models, which produce high quality, yet compressed, discrete representations of the input speech signal.
翻译:当记录口头语言时,从语言上记录不受监督的单词分割(UWS)是一项有用但又具有挑战性的任务,可以通过语音抄录完成,或者在没有这些抄录的情况下,通过未经监督的单词分解模型的输出完成。这些分解模型仅使用原始语言进行培训,产生可应用于下游(基于文本)任务的单独语音单元。在本文中,我们比较了其中五个模型:三种巴伊西亚和两种神经方法,关于所生产的UWS单元的可开发性。两个UWS模型正在试验,并在低资源环境下报告芬兰、匈牙利、姆贝希、罗马尼亚和俄罗斯语的成绩(仅使用5k句)。我们的结果表明,在我们的环境下,单词分解神经模型很难被利用,而且可能有必要将其调整到限定的序列长度。我们通过使用SHMM和H-SHMM Bayesian模型获得我们最好的UWS结果,这些模型产生高质量的、但压缩的、分解的输入语音信号。