Temporal Action Localization (TAL) aims to predict both action category and temporal boundary of action instances in untrimmed videos, i.e., start and end time. Fully-supervised solutions are usually adopted in most existing works, and proven to be effective. One of the practical bottlenecks in these solutions is the large amount of labeled training data required. To reduce expensive human label cost, this paper focuses on a rarely investigated yet practical task named semi-supervised TAL and proposes an effective active learning method, named AL-STAL. We leverage four steps for actively selecting video samples with high informativeness and training the localization model, named \emph{Train, Query, Annotate, Append}. Two scoring functions that consider the uncertainty of localization model are equipped in AL-STAL, thus facilitating the video sample rank and selection. One takes entropy of predicted label distribution as measure of uncertainty, named Temporal Proposal Entropy (TPE). And the other introduces a new metric based on mutual information between adjacent action proposals and evaluates the informativeness of video samples, named Temporal Context Inconsistency (TCI). To validate the effectiveness of proposed method, we conduct extensive experiments on two benchmark datasets THUMOS'14 and ActivityNet 1.3. Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
翻译:局部化(TAL)旨在预测未剪辑的视频中的行动类别和行动时间范围,即开始时间和结束时间。 完全监督的解决方案通常在大多数现有作品中被采用,并证明是有效的。 这些解决方案的一个实际瓶颈是需要大量贴标签的培训数据。 为了降低昂贵的人类标签成本,本文件侧重于一个很少调查但实际的任务,即半监督TAL(半监督TAL),并提议一个有效的积极学习方法,名为AL-STAL(AL-STAL)。 我们利用四个步骤积极选择具有高信息性的视频样本,并培训名为\emph{Netrain,Query,Antate,Append}的本地化模型。 两种考虑到本地化模型不确定性的评分功能是在AL-STAL(AL-STL)中安装的,从而便利了视频样本的等级和选择。 为了测量不确定性,本文件将预测的标签分布作为一种名为TemoralProtogration Contropy (TPE) 。 另一步骤则根据相邻行动提议之间的信息推出一种新的衡量标准,并评价视频样本的知情性, 称为Tealal-Osal Intraview contraview concal const const contradeal tradeal tradeal tracentalalalalmentalalalalalalmentalmentalmentalityal