Temporal Action Localization (TAL) task in which the aim is to predict the start and end of each action and its class label has many applications in the real world. But due to its complexity, researchers have not reached great results compared to the action recognition task. The complexity is related to predicting precise start and end times for different actions in any video. In this paper, we propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing ideas for TAL task. Specifically, we propose a new design for the output layer of the GRU resulting in the so-called GRU-Splitted model. Moreover, linear interpolation is used to generate the action proposals with precise start and end times. Finally, to rank the generated proposals appropriately, we use a Learn to Rank (LTR) approach. We evaluated the performance of the proposed method on Thumos14 dataset. Results show the superiority of the performance of the proposed method compared to state-of-the-art. Especially in the mean Average Precision (mAP) metric at Intersection over Union (IoU) 0.7, we get 27.52% which is 5.12% better than that of state-of-the-art methods.
翻译:时间行动本地化( TAL) 任务, 目标是预测每个动作的开始和结束, 其分类标签在现实世界中有许多应用。 但是, 由于其复杂性, 研究人员没有取得与行动识别任务相比的巨大结果。 复杂性与预测任何视频中不同动作的准确开始和结束时间有关。 在本文中, 我们提出一个新的网络, 以Gated 经常性单元为基础, 并为TAL 任务提出两个新的后处理想法 。 具体地说, 我们为GRU的产出层提出了一个新的设计, 从而产生了所谓的GRU- 平板模型 。 此外, 线性内插法用于生成行动建议, 与行动识别任务相比, 与行动识别任务识别任务相比, 与行动识别任务识别任务相对复杂。 最后, 为了对生成的建议进行适当的排序, 我们使用“ 学习到排名” 方法 。 我们评估了Thumos14 数据集的拟议方法的绩效。 结果显示, 与最新任务相比, 拟议方法的性能优于状态。 。 特别是在平均精准度( mAP) 指标中, 联盟( IoU) 0. 0. 0. 0. 0. 0. 7, 我们得到了 5 12 % 的方法比5 更好的5. 。