Alzheimer's dementia (AD) affects memory, thinking, and language, deteriorating person's life. An early diagnosis is very important as it enables the person to receive medical help and ensure quality of life. Therefore, leveraging spontaneous speech in conjunction with machine learning methods for recognizing AD patients has emerged into a hot topic. Most of the previous works employ Convolutional Neural Networks (CNNs), to process the input signal. However, finding a CNN architecture is a time-consuming process and requires domain expertise. Moreover, the researchers introduce early and late fusion approaches for fusing different modalities or concatenate the representations of the different modalities during training, thus the inter-modal interactions are not captured. To tackle these limitations, first we exploit a Neural Architecture Search (NAS) method to automatically find a high performing CNN architecture. Next, we exploit several fusion methods, including Multimodal Factorized Bilinear Pooling and Tucker Decomposition, to combine both speech and text modalities. To the best of our knowledge, there is no prior work exploiting a NAS approach and these fusion methods in the task of dementia detection from spontaneous speech. We perform extensive experiments on the ADReSS Challenge dataset and show the effectiveness of our approach over state-of-the-art methods.
翻译:阿尔茨海默病(AD)会影响人的记忆、思考和语言能力,破坏生活质量。早期诊断非常重要,因为它使患者能够接受医疗帮助并保证生活质量。因此,结合机器学习方法和自发语音来识别AD患者已成为一个热门话题。以往的大部分研究采用卷积神经网络(CNN)处理输入信号。然而,寻找CNN架构是一项耗时的过程,需要领域知识,因此我们提出了一种神经架构搜索(NAS)方法,自动寻找高性能CNN架构。此外,我们利用多模态因子双线性池化和Tucker分解等多种融合方法,结合语音和文本模态。与我们所知的最先进的方法相比,上述方法在ADReSS挑战数据集上表现出较好的效果,并且我们的方法在这个任务中是首次利用NAS方法和这些融合方法来检测自发言语中的痴呆症。