多层深辐射基础功能网络 (Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition)

Emotion recognition (ER) from facial images is one of the landmark tasks in affective computing with major developments in the last decade. Initial efforts on ER relied on handcrafted features that were used to characterize facial images and then feed to standard predictive models. Recent methodologies comprise end-to-end trainable deep learning methods that simultaneously learn both, features and predictive model. Perhaps the most successful models are based on convolutional neural networks (CNNs). While these models have excelled at this task, they still fail at capturing local patterns that could emerge in the learning process. We hypothesize these patterns could be captured by variants based on locally weighted learning. Specifically, in this paper we propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units that aims at exploiting local information at the final stage of the learning process. Intuitively, these RBF units capture local patterns shared by similar instances using an intermediate representation, then the outputs of the RBFs are feed to a softmax layer that exploits this information to improve the predictive performance of the model. This feature could be particularly advantageous in ER as cultural / ethnicity differences may be identified by the local units. We evaluate the proposed method in several ER datasets and show the proposed methodology achieves state-of-the-art in some of them, even when we adopt a pre-trained VGG-Face model as backbone. We show it is the incorporation of local information what makes the proposed model competitive.

翻译：面部图像中的情感识别(ER)是影响性计算中的里程碑性任务之一,在过去十年中取得了重大进展。ER的初始工作依赖于手工制作的特征,这些特征用来描述面部图像的特征,然后反馈到标准的预测模型。最近的方法包括端到端可培训的深层次学习方法,这些方法既学习特征,又同时预测模型。也许最成功的模型基于的是同步神经网络(CNNs ) 。虽然这些模型在这项工作中表现优异,但它们仍然未能捕捉到学习过程中可能出现的本地模式。我们可以根据本地加权学习的变异来推断这些模式。具体地说,在本文中,我们提议以CNN为基础的结构结构结构,由通过辐射基础功能(RBF)单位组成的多个分支加以强化,目的是在学习过程的最后阶段利用当地信息,同时利用本地神经神经系统(RBFF)的网络,这些单元的输出结果仍然无法捕捉到在学习过程中出现的本地模式,而我们提出的模型在改进模型的预测性能改进模型的性能表现。这一功能在通过我们提议的“核心”中特别具有优势,因为我们提议的“核心”方法显示了“核心”的“核心”方法,显示了“我们”的“核心”的“结构”的“结构”的“结构”方法,它可能显示了“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“差异”的“结构”的“差异,它”的“结构”是“结构”的“我们”的“我们”的“我们”的“结构”的“结构”的“结构”的“结构”的“结构”是“结构”的“结构”的“结构”的“结构”中,它显示了“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”的“结构”可能显示”的“结构”是”的“结构”是“结构”的“结构”的“结构”的“结构”是“结构”的“结构”的“结构”的“结构”是“结构”的“结构”的“结构”的“结构”的“结构