Group affect refers to the subjective emotion that is evoked by an external stimulus in a group, which is an important factor that shapes group behavior and outcomes. Recognizing group affect involves identifying important individuals and salient objects among a crowd that can evoke emotions. Most of the existing methods are proposed to detect faces and objects using pre-trained detectors and summarize the results into group emotions by specific rules. However, such affective region selection mechanisms are heuristic and susceptible to imperfect faces and objects from the pre-trained detectors. Moreover, faces and objects on group-level images are often contextually relevant. There is still an open question about how important faces and objects can be interacted with. In this work, we incorporate the psychological concept called Most Important Person (MIP). It represents the most noteworthy face in the crowd and has an affective semantic meaning. We propose the Dual-branch Cross-Patch Attention Transformer (DCAT) which uses global image and MIP together as inputs. Specifically, we first learn the informative facial regions produced by the MIP and the global context separately. Then, the Cross-Patch Attention module is proposed to fuse the features of MIP and global context together to complement each other. With parameters less than 10x, the proposed DCAT outperforms state-of-the-art methods on two datasets of group valence prediction, GAF 3.0 and GroupEmoW datasets. Moreover, our proposed model can be transferred to another group affect task, group cohesion, and shows comparable results.
翻译:群体影响是指一个群体外部刺激所引发的主观情感,这是影响群体行为和结果的一个重要因素。承认群体影响涉及在人群中识别重要个人和显要对象,能够引起情绪。大多数现有方法建议使用预先训练的检测器来检测脸部和对象,并用具体规则将结果归纳成群体情感。然而,这种感性区域选择机制是超常的,容易受到未经训练的检测器产生的不完善面部和对象的影响。此外,群体图像上的面部和对象往往具有背景关联性。对于如何与群体行为和结果互动,仍然存在一个开放的问题。在此工作中,我们采纳了名为最重要人物(MIP)的心理概念。它代表了人群中最值得注意的面部和对象,并具有感性语义含义。我们建议采用双边交叉吸引注意变异体(DCAT)来同时使用全球形象和MIP作为投入。我们首先学习MIP和全球背景中的信息化区域。然后,交叉关注模块将MIP的面部位和对象相互互动。然后,我们建议将MIP组的特征和GIP的对比值组的对比值组合将不同的数据转换为另一个数据组合。