Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, we propose the use of behaviour encoding which boosts performance with minimal change to the model. Cross-attention using transformers has become popular in recent times and is utilised for fusion of different modalities. Since long term relations may exist, breaking the input into chunks is not desirable, thus the proposed model processes the entire input together. Our experiments show the importance of each of the above contributions
翻译:个人性和感官计算和感官计算最近在许多研究领域引起了关注。任务数据集通常有多种模式,如视频、音频、语言和生物信号等。在本文件中,我们为利用所有可用数据的任务提出了一个灵活模式。任务涉及复杂的关系,为了避免使用大型视频处理模式,我们提议使用行为编码来提高性能,同时对模型作出最小的改变。使用变压器的相互注意最近变得很普遍,并被用于不同模式的融合。由于长期关系可能存在,将输入的分解成块是不可取的,因此拟议的模型将全部输入一起处理。我们的实验显示了上述每项贡献的重要性。