Human group detection, which splits crowd of people into groups, is an important step for video-based human social activity analysis. The core of human group detection is the human social relation representation and division.In this paper, we propose a new two-stage multi-head framework for human group detection. In the first stage, we propose a human behavior simulator head to learn the social relation feature embedding, which is self-supervisely trained by leveraging the socially grounded multi-person behavior relationship. In the second stage, based on the social relation embedding, we develop a self-attention inspired network for human group detection. Remarkable performance on two state-of-the-art large-scale benchmarks, i.e., PANDA and JRDB-Group, verifies the effectiveness of the proposed framework. Benefiting from the self-supervised social relation embedding, our method can provide promising results with very few (labeled) training data. We will release the source code to the public.
翻译:人类群体检测是人类社会活动分析的一个重要步骤。 人类群体检测的核心是人类社会关系的代表和划分。 在本文件中,我们提出了一个新的两阶段多头人类群体检测框架。 在第一阶段,我们提议一个人类行为模拟器头来学习社会关系嵌入功能,这是通过利用社会基础的多人行为关系进行自我监督培训的。在第二阶段,基于社会关系嵌入,我们开发了一个自我关注启发的人类群体检测网络。两种最先进的大规模基准的显著表现,即PANDA和JRDB组,验证了拟议框架的有效性。从自我监督的社会关系嵌入中受益,我们的方法能够以极少的(贴标签的)培训数据提供有希望的结果。我们将向公众发布源代码。