The success of deep learning models depends on the size and quality of the dataset to solve certain tasks. Here, we explore how far generated data can aid real data in improving the performance of Neural Networks. In this work, we consider facial expression recognition since it requires challenging local data generation at the level of local regions such as mouth, eyebrows, etc, rather than simple augmentation. Generative Adversarial Networks (GANs) provide an alternative method for generating such local deformations but they need further validation. To answer our question, we consider noncomplex Convolutional Neural Networks (CNNs) based classifiers for recognizing Ekman emotions. For the data generation process, we consider generating facial expressions (FEs) by relying on two GANs. The first generates a random identity while the second imposes facial deformations on top of it. We consider training the CNN classifier using FEs from: real-faces, GANs-generated, and finally using a combination of real and GAN-generated faces. We determine an upper bound regarding the data generation quantity to be mixed with the real one which contributes the most to enhancing FER accuracy. In our experiments, we find out that 5-times more synthetic data to the real FEs dataset increases accuracy by 16%.
翻译:成功的深度学习模型取决于解决特定任务所需的数据集的大小和质量。本研究探讨了生成数据在多大程度上可以帮助真实数据提高神经网络的性能。在本文中,我们考虑面部表情识别,因为它需要在局部区域(例如嘴巴、眉毛等)产生具有挑战性的局部数据,而不是简单的数据增强。生成对抗网络(GAN)提供了一种生成这种局部变形的替代方法,但需要进一步验证。为了回答我们的问题,我们考虑使用基于卷积神经网络(CNN)的非复杂分类器来识别艾克曼情感。对于数据生成过程,我们考虑通过依赖两个GAN生成面部表情(FEs)。第一个生成随机身份,而第二个施加面部变形。我们考虑使用来自三种数据集的FEs来训练CNN分类器:真实面孔、GAN生成的面孔和真实面孔与GAN生成的面孔的组合。我们确定了一种上限,关于要将生成数据混合到真实数据中的数量,哪种方式对提高FER精度贡献最大。在我们的实验中,我们发现将5倍于真实FEs数据集的合成数据混合后,精度提高了16%。
Note: The acronym GAN is not translated into Chinese as it is a technical term widely used in the research field.