Facial Expression Recognition (FER) is an active research domain that has shown great progress recently, notably thanks to the use of large deep learning models. However, such approaches are particularly energy intensive, which makes their deployment difficult for edge devices. To address this issue, Spiking Neural Networks (SNNs) coupled with event cameras are a promising alternative, capable of processing sparse and asynchronous events with lower energy consumption. In this paper, we establish the first use of event cameras for FER, named "Event-based FER", and propose the first related benchmarks by converting popular video FER datasets to event streams. To deal with this new task, we propose "Spiking-FER", a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN). Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x). In addition, an experimental study of various event-based data augmentation techniques is performed to provide insights into the efficient transformations specific to event-based FER.
翻译:面部表情识别(FER)是一个活跃的研究领域,近年来取得了巨大进展,这主要归功于大型深度学习模型的使用。然而,这种方法特别耗能,这使得它们难以在边缘设备上部署。为了解决这个问题,耗能较少的事件摄像头与尖峰神经网络(SNNs)的组合是一种有前途的替代方案,能够处理稀疏和异步事件。在本文中,我们建立了用于FER的事件相机,称为“基于事件的FER”,并通过将流行的视频FER数据集转换为事件流来提出相关基准。为处理这个新任务,我们提出了“Spiking-FER”,这是一种深度卷积SNN模型,并与类似的人工神经网络(ANN)进行比较。实验表明,所提出的方法实现了与ANN架构相当的性能,同时耗能减少了几个数量级(最高可达65.39倍)。此外,对各种基于事件的数据增强技术进行了实验研究,以提供针对基于事件FER的有效转换的见解。