外壳:多模式人类国家承认的多模式变异器 (Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition)

Human state recognition is a critical topic with pervasive and important applications in human-machine systems.Multi-modal fusion, the combination of metrics from multiple data sources, has been shown as a sound method for improving the recognition performance. However, while promising results have been reported by recent multi-modal-based models, they generally fail to leverage the sophisticated fusion strategies that would model sufficient cross-modal interactions when producing the fusion representation; instead, current methods rely on lengthy and inconsistent data preprocessing and feature crafting. To address this limitation, we propose an end-to-end multi-modal transformer framework for multi-modal human state recognition called Husformer.Specifically, we propose to use cross-modal transformers, which inspire one modality to reinforce itself through directly attending to latent relevance revealed in other modalities, to fuse different modalities while ensuring sufficient awareness of the cross-modal interactions introduced. Subsequently, we utilize a self-attention transformer to further prioritize contextual information in the fusion representation. Using two such attention mechanisms enables effective and adaptive adjustments to noise and interruptions in multi-modal signals during the fusion process and in relation to high-level features. Extensive experiments on two human emotion corpora (DEAP and WESAD) and two cognitive workload datasets (MOCAS and CogLoad) demonstrate that in the recognition of human state, our Husformer outperforms both state-of-the-art multi-modal baselines and the use of a single modality by a large margin, especially when dealing with raw multi-modal signals. We also conducted an ablation study to show the benefits of each component in Husformer/

翻译：人类状态的承认是一个关键议题,人类机器系统中广泛和重要的应用都是如此。多种数据源的量度组合,多模式混合,已证明是提高认知性的一个可靠方法。然而,尽管最近基于多种模式的模型报告,取得了有希望的成果,但总的来说,这些模型未能利用尖端的聚合战略,这些战略在生成聚合代表时将建模足够的跨模式互动;相反,目前的方法依赖于冗长和不一致的预处理数据和特征制作。为解决这一局限性,我们提议为多模式人类状态识别建立一个端到端多模式的多模式变压器框架,称为 Husserow。显而易见,我们提议使用跨模式变压器,鼓励一种模式,通过直接关注其他模式中显示的潜在相关性来增强自身能力,同时确保充分认识所引入的跨模式互动;随后,我们利用自控变变转换器来进一步确定组合中的背景信息的优先次序。利用两种这样的关注机制,在多模式的人类状态识别中,在两种模式的基线交易中,我们建议采用一种是快速调整和中断的多模式变压式变压式变压模式,具体地,在两次的变压中,在两次的变压中,在两次的变压过程中特别的变压过程中,以显示人类变压中,以高的变压的变压式变压式变压中,在两次的变压中,在两次的变压式变压式变压中,特别的变压中,以高的变压式变压式变压式的变压式变压式变压式变压式变压的演算。