Online action detection, which aims to identify an ongoing action from a streaming video, is an important subject in real-world applications. For this task, previous methods use recurrent neural networks for modeling temporal relations in an input sequence. However, these methods overlook the fact that the input image sequence includes not only the action of interest but background and irrelevant actions. This would induce recurrent units to accumulate unnecessary information for encoding features on the action of interest. To overcome this problem, we propose a novel recurrent unit, named Information Discrimination Unit (IDU), which explicitly discriminates the information relevancy between an ongoing action and others to decide whether to accumulate the input information. This enables learning more discriminative representations for identifying an ongoing action. In this paper, we further present a new recurrent unit, called Information Integration Unit (IIU), for action anticipation. Our IIU exploits the outputs from IDU as pseudo action labels as well as RGB frames to learn enriched features of observed actions effectively. In experiments on TVSeries and THUMOS-14, the proposed methods outperform state-of-the-art methods by a significant margin in online action detection and action anticipation. Moreover, we demonstrate the effectiveness of the proposed units by conducting comprehensive ablation studies.
翻译:在线行动检测旨在从流动视频中找出一个持续的行动,这是现实世界应用中的一个重要主题。对于这项任务,以往的方法使用经常性神经网络,在输入序列中模拟时间关系。然而,这些方法忽略了以下事实:输入图像序列不仅包括感兴趣的行动,还包括背景和不相关的行动。这将促使反复出现单位积累不必要的信息,用于对相关行动的编码特征进行编码。为解决这一问题,我们提议设立一个名为信息歧视股(IDU)的新经常单位,明确区分当前行动与决定是否积累投入信息的其他行动之间的相关性。这有利于学习更具有歧视性的表达方式,以识别持续的行动。在本文中,我们进一步提出一个新的经常性单位,称为信息整合股(IIU),用于行动预测。我们的IIU将IDU的产出用作假动作标签和RGB框架,以有效了解观察到的行动的丰富特征。在TeVSeries和THUMOS-14的实验中,拟议方法通过在线行动检测和行动预测方面的重大空间,超越了最新方法。此外,我们通过开展拟议的综合研究,展示了拟议的单位的有效性。