Online Action Detection (OAD) in videos is proposed as a per-frame labeling task to address the real-time prediction tasks that can only obtain the previous and current video frames. This paper presents a novel learning-with-privileged based framework for online action detection where the future frames only observable at the training stages are considered as a form of privileged information. Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student. We note that this setting is different from conventional KD because the difference between the teacher and student models mostly lies in input data rather than the network architecture. We propose Privileged Knowledge Distillation (PKD) which (i) schedules a curriculum learning procedure and (ii) inserts auxiliary nodes to the student model, both for shrinking the information gap and improving learning performance. Compared to other OAD methods that explicitly predict future frames, our approach avoids learning unpredictable unnecessary yet inconsistent visual contents and achieves state-of-the-art accuracy on two popular OAD benchmarks, TVSeries and THUMOS14.
翻译:视频中的在线行动探测(OAD)建议作为按部就班的标签任务处理实时预测任务,只能获得上一个和当前的视频框架。本文展示了一个新的以学习为主的在线行动探测框架,其中将培训阶段仅能看到的未来框架视为一种特殊信息形式。知识蒸馏用于将特许信息从离线教师转移到在线学生。我们注意到,这一设置不同于传统的KD,因为教师和学生模型之间的差别主要在于输入数据而不是网络结构。我们提议了“原始知识蒸馏” (PKD) (PKD) (i) 安排课程学习程序,并(ii) 在学生模型上插入辅助节点,既用于缩小信息差距,也用于改善学习绩效。与明确预测未来框架的其他OAD方法相比,我们的方法避免学习不可预测但不一致的视觉内容,并在两种流行OAD基准、TVSeries和THUMOS14上实现最新准确性。