This work introduces an approach to assessing phrase break in ESL learners' speech with pre-trained language models (PLMs). Different with traditional methods, this proposal converts speech to token sequences, and then leverages the power of PLMs. There are two sub-tasks: overall assessment of phrase break for a speech clip; fine-grained assessment of every possible phrase break position. Speech input is first force-aligned with texts, then pre-processed to a token sequence, including words and associated phrase break information. The token sequence is then fed into the pre-training and fine-tuning pipeline. In pre-training, a replaced break token detection module is trained with token data where each token has a certain percentage chance to be randomly replaced. In fine-tuning, overall and fine-grained scoring are optimized with text classification and sequence labeling pipeline, respectively. With the introduction of PLMs, the dependence on labeled training data has been greatly reduced, and performance has improved.
翻译:这项工作引入了一种方法来评估ESL学习者用经过事先培训的语言模式(PLM)在ESL语言语言语言语言演讲中的句号中断。 与传统方法不同,本提案将演讲转换为象征性序列,然后利用PLM的力量。 有两个子任务:对语音剪辑的句号中断进行总体评估;对每个可能的短语断段位置进行细微评估。 语音输入首先与文本相容,然后预处理为象征性序列,包括文字和相关短语断段信息。 符号序列随后被输入到培训前和微调管道中。 在培训前,一个替代的断段信号检测模块用象征性数据进行培训,其中每个符号都有一定百分比的机会被随机替换。 在微调中,总体和微分评分的评分分别以文字分类和标记管道的顺序优化。 随着引入PLMS,对标签培训数据的依赖程度大大降低,业绩也有所改善。