快速在线行动探测信息提升网络 (Information Elevation Network for Fast Online Action Detection)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Online action detection (OAD) is a task that receives video segments within a streaming video as inputs and identifies ongoing actions within them. It is important to retain past information associated with a current action. However, long short-term memory (LSTM), a popular recurrent unit for modeling temporal information from videos, accumulates past information from the previous hidden and cell states and the extracted visual features at each timestep without considering the relationships between the past and current information. Consequently, the forget gate of the original LSTM can lose the accumulated information relevant to the current action because it determines which information to forget without considering the current action. We introduce a novel information elevation unit (IEU) that lifts up and accumulate the past information relevant to the current action in order to model the past information that is especially relevant to the current action. To the best of our knowledge, our IEN is the first attempt that considers the computational overhead for the practical use of OAD. Through ablation studies, we design an efficient and effective OAD network using IEUs, called an information elevation network (IEN). Our IEN uses visual features extracted by a fast action recognition network taking only RGB frames because extracting optical flows requires heavy computation overhead. On two OAD benchmark datasets, THUMOS-14 and TVSeries, our IEN outperforms state-of-the-art OAD methods using only RGB frames. Furthermore, on the THUMOS-14 dataset, our IEN outperforms the state-of-the-art OAD methods using two-stream features based on RGB frames and optical flows.

翻译：在线行动探测( OAD) 是一项任务, 它在流动视频中接收视频片段, 作为输入, 并识别当前行动。重要的是要保留与当前行动相关的过去信息。但是, 长期短期内存( LSTM) 是建模视频中的时间信息广受欢迎的经常性单位, 收集来自先前的隐藏状态和单元格状态的过去信息, 并在不考虑过去和当前信息之间关系的情况下, 在每个时间步段抽取的视觉特征。因此, 原始 LSTM 的忘记大门可能会丢失与当前行动相关的累积信息, 因为它决定了哪些信息需要忘记, 而没有考虑到当前行动。我们引入了一个新的信息高度单位( IEU), 以提升和积累与当前行动相关的过去信息。然而, 长期内存( LSTM ) ( LSTM ) ( LSTM ) ( LSM) ( LSM) ( LSM) ( LS- O) ( LSO) ( LO- IADAD) ( O- RVAD ( O- RVAD) ( OD) ( OVAD) ( OVAD) ( OD) ( OVAD) ( OD) ( OVD) ( OV) ( OD) ( OV- RB) ( 框架的流流, 流流和 OV) ( OV) ( OVADAD) ( OD) ( ) ( ) ( OD) ( OV) ( ) ( 基) ( 流 ) ( ) ( OVDAD) (, 流 ) ( ) ( 流流 ) ( ) ( 的) ( 流 ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( OVTODADADADADADAD) ) ( ) ( ) ( ) ( ) ) ) ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (