Action recognition, early prediction, and online action detection are complementary disciplines that are often studied independently. Most online action detection networks use a pre-trained feature extractor, which might not be optimal for its new task. We address the task-specific feature extraction with a teacher-student framework between the aforementioned disciplines, and a novel training strategy. Our network, Online Knowledge Distillation Action Detection network (OKDAD), embeds online early prediction and online temporal segment proposal subnetworks in parallel. Low interclass and high intraclass similarity are encouraged during teacher training. Knowledge distillation to the OKDAD network is ensured via layer reuse and cosine similarity between teacher-student feature vectors. Layer reuse and similarity learning significantly improve our baseline which uses a generic feature extractor. We evaluate our framework on infrared videos from two popular datasets, NTU RGB+D (action recognition, early prediction) and PKU MMD (action detection). Unlike previous attempts on those datasets, our student networks perform without any knowledge of the future. Even with this added difficulty, we achieve state-of-the-art results on both datasets. Moreover, our networks use infrared from RGB-D cameras, which we are the first to use for online action detection, to our knowledge.
翻译:行动识别、早期预测和在线行动检测是经常独立研究的互补学科。 大多数在线行动检测网络都使用预先培训的特征提取器,这或许不是其新任务的最佳方式。我们用上述学科之间的师生框架和新颖的培训战略来处理任务特定特征提取问题。我们的网络,即在线知识蒸馏行动探测网络(OKDAD),同时嵌入在线早期预测和在线时间段建议子网络。鼓励在教师培训期间使用低等级和高等级的类内相似性。通过层再利用和师生特征矢量的类似性确保了对 OKDAD 网络的知识蒸馏。 层再利用和类似性学习极大地改进了我们使用通用特征提取器的基线。 我们从两个流行数据集NTU RGB+D(行动识别、早期预测)和PKUMMD(行动探测)的红外视频框架。 与以前对这些数据集的尝试不同,我们的学生网络在对未来没有任何了解的情况下运行。即使如此困难,我们也在两种数据采集的网络上实现了最先进的结果,从RGB到我们的红外线上使用我们的网络。