Expressive reading, considered the defining attribute of oral reading fluency, comprises the prosodic realization of phrasing and prominence. In the context of evaluating oral reading, it helps to establish the speaker's comprehension of the text. We consider a labeled dataset of children's reading recordings for the speaker-independent detection of prominent words using acoustic-prosodic and lexico-syntactic features. A previous well-tuned random forest ensemble predictor is replaced by an RNN sequence classifier to exploit potential context dependency across the longer utterance. Further, deep learning is applied to obtain word-level features from low-level acoustic contours of fundamental frequency, intensity and spectral shape in an end-to-end fashion. Performance comparisons are presented across the different feature types and across different feature learning architectures for prominent word prediction to draw insights wherever possible.
翻译:口头阅读被认为是口头阅读流畅的决定性属性,表达式读物被认为是口头阅读流利的决定性属性,它包括预想实现语法和突出度。在评价口头阅读时,它有助于确定发言者对文字的理解。我们考虑一个儿童阅读录音的标签数据集,用于使用声学-分解和词汇-合成特征独立语音检测突出的词句。以前一个对调良好的随机森林混合预测器被一个RNN序列分类器所取代,以利用长期表达的潜在环境依赖性。此外,还运用深层学习来从基本频率、强度和光谱形状的低层次声波层获得字级特征,以端对端的方式进行字级比对不同特征类型和不同特征学习结构进行业绩比较,以便尽可能地进行突出的字性预测。