Depression is a leading cause of death worldwide, and the diagnosis of depression is nontrivial. Multimodal learning is a popular solution for automatic diagnosis of depression, and the existing works suffer two main drawbacks: 1) the high-order interactions between different modalities can not be well exploited; and 2) interpretability of the models are weak. To remedy these drawbacks, we propose a multimodal multi-order factor fusion (MMFF) method. Our method can well exploit the high-order interactions between different modalities by extracting and assembling modality factors under the guide of a shared latent proxy. We conduct extensive experiments on two recent and popular datasets, E-DAIC-WOZ and CMDC, and the results show that our method achieve significantly better performance compared with other existing approaches. Besides, by analyzing the process of factor assembly, our model can intuitively show the contribution of each factor. This helps us understand the fusion mechanism.
翻译:抑郁症是全世界死亡的一个主要原因,对抑郁症的诊断是非三重性的。多模式学习是自动诊断抑郁症的流行解决方案,现有作品有两个主要缺陷:(1)不同模式之间的高度互动无法很好地利用;(2)模型的可解释性薄弱。为了纠正这些缺陷,我们建议采用多式多序列因子聚合法。我们的方法可以很好地利用不同模式之间的高端互动,在共同的潜在代用物指南下提取和组合模式要素。我们对两个最新的和受欢迎的数据集(E-DAIC-WOZ和CMDC)进行了广泛的实验,结果显示我们的方法与其他现有方法相比取得了显著的更好业绩。此外,通过分析要素组装过程,我们的模型可以直截了当地显示每个要素的贡献。这有助于我们理解聚变机制。