Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure. Alzheimer's is the most common cause of dementia, which constitutes a general term for loss of memory. Due to the fact that dementia affects speech, existing research initiatives focus on detecting dementia from spontaneous speech. However, little work has been done regarding the conversion of speech data to Log-Mel spectrograms and Mel-frequency cepstral coefficients (MFCCs) and the usage of pretrained models. Concurrently, little work has been done in terms of both the usage of transformer networks and the way the two modalities, i.e., speech and transcripts, are combined in a single neural network. To address these limitations, first we employ several pretrained models, with Vision Transformer (ViT) achieving the highest evaluation results. Secondly, we propose multimodal models. More specifically, our introduced models include Gated Multimodal Unit in order to control the influence of each modality towards the final classification and crossmodal attention so as to capture in an effective way the relationships between the two modalities. Extensive experiments conducted on the ADReSS Challenge dataset demonstrate the effectiveness of the proposed models and their superiority over state-of-the-art approaches.
翻译:阿尔茨海默氏性阿尔茨海默氏病(AD)是一种神经退化性疾病,如果没有现成的治疗方法,这种疾病如果不被早期诊断,就会对人们日常生活产生严重后果。阿尔茨海默氏病是痴呆最常见的致痴呆症最常见的原因,这是失忆的一般术语。由于痴呆症影响言语,现有研究举措的重点是检测自发言中的痴呆症。然而,在将语言数据转换成日志-兆谱和梅尔频率阴部系数(MFCCs)以及使用预先培训的模型方面,工作很少。与此同时,在变异器网络的使用和两种模式(即语音和笔记本)相结合的方式方面,工作很少。为了克服这些局限性,我们首先采用了几个预先训练的模型,先是愿景变异体(VIT)取得最高评价结果。我们提出了多式联运模型。更具体地说,我们推出的模型包括Ged Muldomod 单元,以控制每种模式对最终分类和交叉关注的影响。同时,在两种模式(即语音和笔记式)的两种模式上,以有效的方式来展示它们所建的变革性的方法,从而展示了两个挑战性模式之间的对比。