Synthesizing realistic data samples is of great value for both academic and industrial communities. Deep generative models have become an emerging topic in various research areas like computer vision and signal processing. Affective computing, a topic of a broad interest in computer vision society, has been no exception and has benefited from generative models. In fact, affective computing observed a rapid derivation of generative models during the last two decades. Applications of such models include but are not limited to emotion recognition and classification, unimodal emotion synthesis, and cross-modal emotion synthesis. As a result, we conducted a review of recent advances in human emotion synthesis by studying available databases, advantages, and disadvantages of the generative models along with the related training strategies considering two principal human communication modalities, namely audio and video. In this context, facial expression synthesis, speech emotion synthesis, and the audio-visual (cross-modal) emotion synthesis is reviewed extensively under different application scenarios. Gradually, we discuss open research problems to push the boundaries of this research area for future works.
翻译:深基因模型已成为计算机视觉和信号处理等各种研究领域的新课题。 情感计算是计算机视觉社会广泛关注的一个专题,它也不例外,而且从基因模型中受益。事实上,情感计算观测了过去二十年来基因模型的迅速衍生。这种模型的应用包括但不限于情感识别和分类、单模式情感合成和跨模式情感合成。结果,我们通过研究现有数据库、基因模型的优缺点以及考虑两种主要人类交流方式(即音频和视频)的相关培训战略,对人类情感合成的最新进展进行了审查。在这方面,在不同的应用设想下,对面部表达合成、语音合成和视听(跨模式)情感合成进行了广泛的审查。我们渐渐地讨论了将这一研究领域的界限推向未来工作的公开研究问题。