Training temporal action detection in videos requires large amounts of labeled data, yet such annotation is expensive to collect. Incorporating unlabeled or weakly-labeled data to train action detection model could help reduce annotation cost. In this work, we first introduce the Semi-supervised Action Detection (SSAD) task with a mixture of labeled and unlabeled data and analyze different types of errors in the proposed SSAD baselines which are directly adapted from the semi-supervised classification task. To alleviate the main error of action incompleteness (i.e., missing parts of actions) in SSAD baselines, we further design an unsupervised foreground attention (UFA) module utilizing the "independence" between foreground and background motion. Then we incorporate weakly-labeled data into SSAD and propose Omni-supervised Action Detection (OSAD) with three levels of supervision. An information bottleneck (IB) suppressing the scene information in non-action frames while preserving the action information is designed to help overcome the accompanying action-context confusion problem in OSAD baselines. We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1.2, and demonstrate the effectiveness of the proposed UFA and IB methods. Lastly, the benefit of our full OSAD-IB model under limited annotation budgets is shown by exploring the optimal annotation strategy for labeled, unlabeled and weakly-labeled data.
翻译:在视频中培训时间行动探测需要大量标签数据,然而,这种批注却费用昂贵。为了减少SSAD基线中的主要行动不完善错误(即缺少的行动部分),我们进一步设计一个不受监督的地面关注模块,利用前台和背景运动之间的“独立”来降低批注成本。在此工作中,我们首先将半监督的行动探测任务与标签和未标签数据混合起来,并分析拟议的SSAD基线中不同类型的错误,这些错误直接根据半监督分类任务加以调整。为了减少SSAD基线中的主要行动不完善错误(即缺少的行动部分),我们进一步设计了一个不受监督的地面关注模块(UFA),利用前台和背景运动之间的“独立性”来帮助降低批注成本。然后,我们将标签不严密的半监督行动探测任务任务,并用三个级别的监督级别来提出Omni监督行动检测(OSAD)拟议基准和IFADAFAFA的有限预算,我们根据SADA和IOAFA展示的有限数据定义, 展示了我们最佳数据库和ADAFAFAA的有限定义的基线,在最后展示了我们最佳数据库中展示了我们的最佳数据库中显示的效益。