Although much progress has been made in visual emotion recognition, researchers have realized that modern deep networks tend to exploit dataset characteristics to learn spurious statistical associations between the input and the target. Such dataset characteristics are usually treated as dataset bias, which damages the robustness and generalization performance of these recognition systems. In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation. To alleviate the negative effects brought by the dataset bias, we propose a novel Interventional Emotion Recognition Network (IERN) to achieve the backdoor adjustment, which is one fundamental deconfounding technique in causal inference. A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
翻译:尽管在视觉情感认知方面取得了很大进展,研究人员已经认识到现代深层网络往往利用数据集特征来学习输入和目标之间虚假的统计联系。这类数据集特征通常被视为数据集偏差,损害这些识别系统的稳健性和一般性能。在这项工作中,我们从因果推断的角度来审视这一问题,这种数据集特征被称作是误导系统了解虚假关联的混乱者。为了减轻数据集偏差带来的消极影响,我们提议建立一个创新的干预情感识别网络,以实现后门调整,而后门调整是因果推断中一种根本的分解技术。一系列设计测试验证了结核网络的有效性,关于三种情感基准的实验表明,结核网络超越了其他最先进的方法。