Multi-label activity recognition is designed for recognizing multiple activities that are performed simultaneously or sequentially in each video. Most recent activity recognition networks focus on single-activities, that assume only one activity in each video. These networks extract shared features for all the activities, which are not designed for multi-label activities. We introduce an approach to multi-label activity recognition that extracts independent feature descriptors for each activity and learns activity correlations. This structure can be trained end-to-end and plugged into any existing network structures for video classification. Our method outperformed state-of-the-art approaches on four multi-label activity recognition datasets. To better understand the activity-specific features that the system generated, we visualized these activity-specific features in the Charades dataset.
翻译:多标签活动识别旨在识别每个视频同时或相继开展的多种活动。大多数最近的活动识别网络侧重于单项活动,每个视频只包含一项活动。这些网络为所有活动提取共同特征,而并非为多标签活动设计。我们引入了多标签活动识别方法,为每项活动提取独立特征描述符,并学习活动相关性。这一结构可以经过培训,端对端,并插入任何现有的视频分类网络结构。我们的方法在四个多标签活动识别数据集方面优于最先进的方法。为了更好地了解系统产生的具体活动特征,我们在 Chartes数据集中直观了这些活动的具体特征。