Temporal action proposal generation (TAPG) is a challenging task that aims to locate action instances in untrimmed videos with temporal boundaries. To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth. In this paper, we innovatively propose a general auxiliary Background Constraint idea to further suppress low-quality proposals, by utilizing the background prediction score to restrict the confidence of proposals. In this way, the Background Constraint concept can be easily plug-and-played into existing TAPG methods (e.g., BMN, GTAD). From this perspective, we propose the Background Constraint Network (BCNet) to further take advantage of the rich information of action and background. Specifically, we introduce an Action-Background Interaction module for reliable confidence evaluation, which models the inconsistency between action and background by attention mechanisms at the frame and clip levels. Extensive experiments are conducted on two popular benchmarks, i.e., ActivityNet-1.3 and THUMOS14. The results demonstrate that our method outperforms state-of-the-art methods. Equipped with the existing action classifier, our method also achieves remarkable performance on the temporal action localization task.
翻译:时间行动建议生成(TAPG)是一项具有挑战性的任务,目的是将行动实例定位于不设节奏的视频中,并设定时间界限。为了评估建议的信心,现有工作通常预测由时间跨交联合(TIOU)监督的提案和地面真相之间的建议的行动分数。在本文件中,我们创新地提出了一个一般辅助背景限制构想,以进一步抑制低质量建议,方法是利用背景预测评分来限制建议的信心。这样,背景限制概念可以很容易地插插进现有的TAPG方法(如BMN、GTAD)。从这个角度出发,我们建议背景控制网络(BCNet)进一步利用丰富的行动和背景信息。具体地说,我们为可靠的信任评价引入了一个行动背景互动模块,该模块以框架和剪辑级关注机制的行动和背景为模型。在两种流行基准上进行了广泛的实验,即活动网络1.3和THUMOOS14。结果表明,我们目前的行动方法也以出色的时间化方式超越了目前的行动方式。