Recently, temporal action localization (TAL), i.e., finding specific action segments in untrimmed videos, has attracted increasing attentions of the computer vision community. State-of-the-art solutions for TAL involves evaluating the frame-level probabilities of three action-indicating phases, i.e. starting, continuing, and ending; and then post-processing these predictions for the final localization. This paper delves deep into this mechanism, and argues that existing methods, by modeling these phases as individual classification tasks, ignored the potential temporal constraints between them. This can lead to incorrect and/or inconsistent predictions when some frames of the video input lack sufficient discriminative information. To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase; and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases. Jointly optimizing these two terms, the entire framework is aware of these potential constraints during an end-to-end optimization process. Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3. Our approach clearly outperforms the baseline both quantitatively and qualitatively. The proposed regularization also generalizes to other TAL methods (e.g., TSA-Net and PGCN). code: https://github.com/PeisenZhao/Bottom-Up-TAL-with-MR


翻译:最近,时间行动本地化(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL)(TAL(TAL))(TAL))(TAL(TAL))(TAL(TAL))(TAL(TALL)))(TAL(TAL))(TAL(TRA)))(TAL(IRC))(InterC))(InterC)(InterC)(InterC)(InterC)(InterC)(Inter-C)(InterC)(InterS)(InterC)(InterC)(InterC)(InterC)(InterC)(InterC)(Inform)(Informolation)(Informissional)(Incolation)(Inter-C)(InterC)(InterC)(Incolislation) (I) (I) (ID) (I) (I) (Int) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I)) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I)) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I)) (I) (I) (I) (I) (I) (I) (I) (I) (I) (I) (T) (T) (T) (T) (I) (I) (I) (

0
下载
关闭预览

相关内容

【CVPR 2021】变换器跟踪TransT: Transformer Tracking
专知会员服务
21+阅读 · 2021年4月20日
灾难性遗忘问题新视角:迁移-干扰平衡
CreateAMind
17+阅读 · 2019年7月6日
Hierarchically Structured Meta-learning
CreateAMind
24+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
27+阅读 · 2019年5月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
41+阅读 · 2019年1月3日
计算机视觉领域顶会CVPR 2018 接受论文列表
Learning Blind Video Temporal Consistency
Arxiv
3+阅读 · 2018年8月1日
VIP会员
相关VIP内容
【CVPR 2021】变换器跟踪TransT: Transformer Tracking
专知会员服务
21+阅读 · 2021年4月20日
Top
微信扫码咨询专知VIP会员