Deception detection is an important task that has been a hot research topic due to its potential applications. It can be applied in many areas, from national security (e.g., airport security, jurisprudence, and law enforcement) to real-life applications (e.g., business and computer vision). However, some critical problems still exist and are worth more investigation. One of the significant challenges in the deception detection tasks is the data scarcity problem. Until now, only one multi-modal benchmark open dataset for human deception detection has been released, which contains 121 video clips for deception detection (i.e., 61 for deceptive class and 60 for truthful class). Such an amount of data is hard to drive deep neural network-based methods. Hence, those existing models often suffer from overfitting problems and low generalization ability. Moreover, the ground truth data contains some unusable frames for many factors. However, most of the literature did not pay attention to these problems. Therefore, in this paper, we design a series of data preprocessing methods to deal with the aforementioned problem first. Then, we propose a multi-modal deception detection framework to construct our novel emotional state-based feature and use the open toolkit openSMILE to extract the features from the audio modality. We also design a voting scheme to combine the emotional states information obtained from visual and audio modalities. Finally, we can determine the novel emotion state transformation feature with our self-designed algorithms. In the experiment, we conduct the critical analysis and comparison of the proposed methods with the state-of-the-art multi-modal deception detection methods. The experimental results show that the overall performance of multi-modal deception detection has a significant improvement in the accuracy from 87.77% to 92.78% and the ROC-AUC from 0.9221 to 0.9265.
翻译:87. 欺骗性探测是一项重要任务,由于其潜在的应用,它是一个热度研究课题。它可以应用于许多领域,从国家安全(如机场安全、判例和执法)到现实应用(如商业和计算机愿景),然而,仍然存在一些关键问题,值得进行更多的调查。欺骗性探测任务中的一项重大挑战是数据稀缺问题。到目前为止,只发布了一个用于人类欺骗检测的多模式基准开放数据集,其中包括121个用于欺骗性检测的视频剪辑(即61个用于欺骗性等级,60个用于真实等级)。这种数量的数据难以推动基于神经网络的深层方法。因此,这些现有模型往往存在过于适应问题和低一般化能力的问题。此外,地面真相数据含有许多无法使用的框架。然而,大多数文献并没有关注这些问题。因此,我们设计了一系列用于处理上述问题的预处理方法。然后,我们提议了一个多模式欺骗性检测框架,以构建我们新的情感动态模型模型,我们从开放性精确度的精确度分析方法,我们又使用了一个基于新式的情感状态设计模型的自我测试模式。我们使用一个基于智能模型的自我定位模型,我们使用一种基于最终的状态的状态的状态的状态的状态测试。