We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries. Our action-independent evidence module is incorporated within a pipeline to calculate confidence scores and action classes. We achieve an average mAP of 34.6% on ActivityNet-1.3, particularly outperforming previous methods with the highest IoU of 0.95. TVNet also achieves mAP of 56.0% when combined with PGCN and 59.1% with MUSES at 0.5 IoU on THUMOS14 and outperforms prior work at all thresholds. Our code is available at https://github.com/hanielwang/TVNet.
翻译:我们建议建立一个时空投票网络(TVNet),用于在未剪裁的视频中采取行动,其中包括一个新的投票证据模块,以更准确地确定时间界限,在其中积累时间背景证据,以预测开始和结束行动界限的框架概率。我们的行动独立证据模块被纳入一个管道,以计算信任分数和行动类别。我们在活动网1.3上平均达到34.6%的MAP,特别是比以往在最大IoU(0.95)上采用的业绩优于以往方法的0.95。电视网络在与PGCN(PGCN)和MUSES(MUSES)(0.5 IOU)的THUMOOS14和在所有门槛上比以往的工作都好。我们的代码可在https://github.com/hanielwang/TVNet上查阅。