Disfluencies are prevalent in spontaneous speech, as shown in many studies of adult speech. Less is understood about children's speech, especially in pre-school children who are still developing their language skills. We present a novel dataset with annotated disfluencies of spontaneous explanations from 26 children (ages 5--8), interviewed twice over a year-long period. Our preliminary analysis reveals significant differences between children's speech in our corpus and adult spontaneous speech from two corpora (Switchboard and CallHome). Children have higher disfluency and filler rates, tend to use nasal filled pauses more frequently, and on average exhibit longer reparandums than repairs, in contrast to adult speakers. Despite the differences, an automatic disfluency detection system trained on adult (Switchboard) speech transcripts performs reasonably well on children's speech, achieving an F1 score that is 10\% higher than the score on an adult out-of-domain dataset (CallHome).
翻译:正如许多关于成人讲话的研究所显示的,自发言论中普遍存在差异。对于儿童讲话,特别是学龄前儿童的语言技能仍在发展之中的儿童,理解得较少。我们展示了一个新数据集,其中附有26名儿童(5-8岁)在长达一年的时间内接受两次访谈的自发解释的附加说明的不易解脱。我们的初步分析显示,在我们的体外儿童讲话与两个体外儿童(施密板和呼呼呼呼呼)的成人自发演讲之间存在巨大差异。 儿童有较高的不便和填充率,往往使用鼻腔补足暂停,而且平均显示的复补时间比成人演讲者长。尽管存在差异,但是在成人(施密板)语言记录上培训的自动不便检测系统在儿童讲话上表现得相当好,获得的F1分比成年人外向数据集(CallHome)的得分高出10倍。