Temporal action localization (TAL) is a prevailing task due to its great application potential. Existing works in this field mainly suffer from two weaknesses: (1) They often neglect the multi-label case and only focus on temporal modeling. (2) They ignore the semantic information in class labels and only use the visual information. To solve these problems, we propose a novel Co-Occurrence Relation Module (CORM) that explicitly models the co-occurrence relationship between actions. Besides the visual information, it further utilizes the semantic embeddings of class labels to model the co-occurrence relationship. The CORM works in a plug-and-play manner and can be easily incorporated with the existing sequence models. By considering both visual and semantic co-occurrence, our method achieves high multi-label relationship modeling capacity. Meanwhile, existing datasets in TAL always focus on low-semantic atomic actions. Thus we construct a challenging multi-label dataset UCF-Crime-TAL that focuses on high-semantic actions by annotating the UCF-Crime dataset at frame level and considering the semantic overlap of different events. Extensive experiments on two commonly used TAL datasets, \textit{i.e.}, MultiTHUMOS and TSU, and our newly proposed UCF-Crime-TAL demenstrate the effectiveness of the proposed CORM, which achieves state-of-the-art performance on these datasets.
翻译:由于其巨大的应用潜力,当地时间行动(TAL)是一个普遍的任务。这个领域现有的工程主要有两个弱点:(1) 它们往往忽视多标签案例,只注重时间模型。 (2) 它们忽视了类标签中的语义信息,只使用视觉信息。 为了解决这些问题,我们提议了一个新型的“共振重复关系”模块(CORM),该模块明确模拟行动之间的共振关系。除了视觉信息外,它还进一步利用类标签的语义嵌入来模拟共生关系。 CORM以插接和播放方式工作,很容易与现有的序列模型结合。 通过考虑视觉和语义共振的共振,我们的方法实现了高多标签关系建模能力。 同时,TAL的现有数据集总是侧重于低分辨率原子行动。 因此,我们构建了一个具有挑战性的多标签的UCF-C-犯罪-TAL标签标签标签标签,通过在框架水平的UCF-C-CF-CATL上注解的UC-CF-CF-C-C-CL-CL 和MIL 的双基级数据实验中,这些常规数据级和跨层数据。</s>