When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user's speech with that of a specific native speaker as a model in units of rhythm, phonemes, or words and calculate the differences. However, they require extensive speech data with detailed annotations or can only compare with one specific native speaker. To overcome these problems, we propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners based on a small amount of unannotated speech data without comparison to a specific person. The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. Furthermore, since the pronunciation score and difference/distance are not calculated compared to specific sentences of a particular model, users are free to study the sentences they wish to study. We also built an application to help non-native speakers learn English and confirmed that it can improve users' speech intelligibility.
翻译:当初学者学会使用非母语语言时,他们很难为自己判断自己是否讲得好。 因此, 计算机辅助读音培训系统被用来检测学习者错误的发音。 这些系统通常将用户的发音与特定本地演讲者的发音作为节奏、 电话或文字的模型加以比较, 并计算差异。 但是, 它们需要广泛的语音数据, 配有详细的说明, 或只能与特定的本地演讲者进行比较 。 为了克服这些问题, 我们提出一个新的语言学习支持系统, 计算语言评分, 并检测初学者的错觉。 基于少量未加注的语音数据, 而不与某个特定的人进行比较。 提议的系统使用深入的基于学习的语音处理来显示学习者讲话的发音分数, 以及学习者和一组模型的发音之间的差异/距离。 学习者可以通过消除差异, 缩短与模型的距离,直到他们变得足够有读取能力为止。 此外, 编定的发音分数和阅读者之间的差别, 也是用来计算出一种特定的变音能力, 。