We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits. We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits. Empirical results confirm that kinemes and action units enable discovery of multiple trait-specific behaviors while also enabling explainability in support of the predictions. For fusing cues, we explore decision and feature-level fusion, and an additive attention-based fusion strategy which quantifies the relative importance of the three modalities for trait prediction. Examining various long-short term memory (LSTM) architectures for classification and regression on the MIT Interview and First Impressions Candidate Screening (FICS) datasets, we note that: (1) Multimodal approaches outperform unimodal counterparts; (2) Efficient trait predictions and plausible explanations are achieved with both unimodal and multimodal approaches, and (3) Following the thin-slice approach, effective trait prediction is achieved even from two-second behavioral snippets.
翻译:我们探索多式联运行为提示对可解释的个性和特定采访特征的预测的功效。我们利用称为动脉、原子面部运动称为行动单位和言语特征的初级头部感应单位来估计这些以人为中心的特征。经验性结果证实,动脉和动作单位能够发现多种特定行为,同时也能够为支持预测提供解释。关于发泡信号,我们探索决定和特征级聚合,以及一种基于注意的添加式聚变战略,以量化三种特性预测模式的相对重要性。研究麻省理工省理工学院访谈和第一次印象前筛选(ICS)数据集的各种长期记忆(LSTM)分类和回归结构。我们注意到:(1)多式方法超越了单式和多式对应方法;(2)通过单式和多式方法实现高效的特性预测和合理解释;(3)在采用薄色病方法之后,甚至从两秒行为相断片中也实现有效的特性预测。