Alzheimer's Disease is the most common form of dementia. Automatic detection from speech could help to identify symptoms at early stages, so that preventive actions can be carried out. This research is a contribution to the ADReSSo Challenge, we analyze the usage of a SotA ASR system to transcribe participant's spoken descriptions from a picture. We analyse the loss of performance regarding the use of human transcriptions (measured using transcriptions from the 2020 ADReSS Challenge). Furthermore, we study the influence of a language model -- which tends to correct non-standard sequences of words -- with the lack of language model to decode the hypothesis from the ASR. This aims at studying the language bias and get more meaningful transcriptions based only on the acoustic information from patients. The proposed system combines acoustic -- based on prosody and voice quality -- and lexical features based on the first occurrence of the most common words. The reported results show the effect of using automatic transcripts with or without language model. The best fully automatic system achieves up to 76.06 % of accuracy (without language model), significantly higher, 3 % above, than a system employing word transcriptions decoded using general purpose language models.
翻译:阿尔茨海默氏病是最常见的痴呆症。 从语言中自动检测有助于在早期识别症状,从而可以采取预防性行动。 这项研究有助于ADRESSo挑战, 我们分析使用 SotA ASR 系统从图片中抄录参与者的口述描述。 我们分析使用人类笔录的性能损失( 使用2020 ADRESS 挑战的笔录测量)。 此外, 我们研究一种语言模式的影响 -- -- 这种模式倾向于纠正非标准词序列 -- -- 缺乏语言模型来解码ASR的假设。 其目的是研究语言偏向,并获得更有意义的抄录, 仅以病人的声学信息为基础。 拟议的系统将声学( 以言语和声音质量为基础) 和基于最常用词首发的字典特征结合起来。 所报告的结果显示使用或不使用语言模型的自动笔录的效果。 最佳完全自动系统达到准确度的76.06 %( 不使用语言模型), 大大高于使用通用语言模式的文字解译的系统。