Temporal Action Localization (TAL) task which is to predict the start and end of each action in a video along with the class label of the action has numerous applications in the real world. But due to the complexity of this task, acceptable accuracy rates have not been achieved yet, whereas this is not the case regarding the action recognition task. In this paper, we propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing methods for TAL task. Specifically, we propose a new design for the output layer of the conventionally GRU resulting in the so-called GRU-Split network. Moreover, linear interpolation is used to generate the action proposals with precise start and end times. Finally, to rank the generated proposals appropriately, we use a Learn to Rank (LTR) approach. We evaluated the performance of the proposed method on Thumos14 and ActivityNet-1.3 datasets. Results show the superiority of the performance of the proposed method compared to state-of-the-art. Specifically in the mean Average Precision (mAP) metric at Intersection over Union (IoU) of 0.7 on Thumos14, we get 27.52% accuracy which is 5.12% better than that of state-of-the-art methods.
翻译:时间行动本地化(TAL) 任务 : 在视频中预测每个动作的开始和结束, 以及该动作的分类标签在现实世界中有许多应用。 但是,由于任务的复杂性, 尚未达到可接受的准确率, 而对于行动识别任务则并非如此。 在本文件中, 我们提出一个新的网络, 以Gated 经常性单位为基础( GRU), 以及TAL 任务的两个新型后处理方法为基础。 具体地说, 我们提出了常规GRU产出层的新设计, 最终形成所谓的 GRU- Split 网络。 此外, 线性内插图用于生成行动建议, 并且有精确的起始和结束时间。 最后, 为了对生成的建议进行适当的排序, 我们使用“ 学习排行” 方法。 我们评估了拟议的Thumos 14 和 ActionNet- 1.3 数据集的绩效。 结果显示, 与最新工艺相比, 拟议的方法的绩效优于优势。 具体地说, 我们用平均精度( mAP) 衡量联盟间段(Io) 7- 12 % 的方法比图图图14 的精确度为27. 的精确度。