Speech evaluation is an essential component in computer-assisted language learning (CALL). While speech evaluation on English has been popular, automatic speech scoring on low resource languages remains challenging. Work in this area has focused on monolingual specific designs and handcrafted features stemming from resource-rich languages like English. Such approaches are often difficult to generalize to other languages, especially if we also want to consider suprasegmental qualities such as rhythm. In this work, we examine three different languages that possess distinct rhythm patterns: English (stress-timed), Malay (syllable-timed), and Tamil (mora-timed). We exploit robust feature representations inspired by music processing and vector representation learning. Empirical validations show consistent gains for all three languages when predicting pronunciation, rhythm and intonation performance.
翻译:语言评价是计算机辅助语言学习(CALL)的一个基本组成部分。虽然英语语言评价很受欢迎,但低资源语言的自动语言评分仍具有挑战性。这一领域的工作侧重于由资源丰富的语言(如英语)产生的单一语言的具体设计和手工制作特征,这些方法往往难以推广到其他语言,特别是如果我们也想考虑超分性特征,如节奏。在这项工作中,我们研究三种不同语言具有独特的节奏模式:英语(严格时间)、马来语(可识别时间)和泰米尔语(可识别时间)。我们利用音乐处理和矢量代表学习所启发的强健特征表现。经验验证显示,在预测发音、节奏和进化表现时,所有三种语言都取得了一致的收益。