Temporal action localization (TAL) aims to detect the boundary and identify the class of each action instance in a long untrimmed video. Current approaches treat video frames homogeneously, and tend to give background and key objects excessive attention. This limits their sensitivity to localize action boundaries. To this end, we propose a prior-enhanced temporal action localization method (PETAL), which only takes in RGB input and incorporates action subjects as priors. This proposal leverages action subjects' information with a plug-and-play subject-aware spatial attention module (SA-SAM) to generate an aggregated and subject-prioritized representation. Experimental results on THUMOS-14 and ActivityNet-1.3 datasets demonstrate that the proposed PETAL achieves competitive performance using only RGB features, e.g., boosting mAP by 2.41% or 0.25% over the state-of-the-art approach that uses RGB features or with additional optical flow features on the THUMOS-14 dataset.
翻译:时间行动本地化( TAL) 旨在探测边界, 并在长长的未剪切视频中辨别每个动作实例的类别 。 当前方法对视频框架的处理是单一的, 倾向于对背景和关键对象给予过多的注意 。 这限制了其对动作边界的敏感度 。 为此, 我们提议了先加强的时间行动本地化方法( PETAL ), 该方法仅采用 RGB 输入, 并将行动主题作为前置内容纳入其中 。 本提案利用行动主体的信息, 以插插和播放主题天体空间关注模块( SA- SAM ) 来生成一个集成和主题优先的表达方式 。 THUMOS-14 和 ActionNet- 1.3 数据集的实验结果显示, 拟议的 PETAL 只能使用 RGB 特性实现竞争性的性能, 例如, 将使用 RGB 特征或THUMOS-14 数据集上的额外光流特征的状态方法提高 mAP 2. 41 或 0. 25% 。