With the widespread use of installed cameras, video-based monitoring approaches have seized considerable attention for different purposes like assisted living. Temporal redundancy and the sheer size of raw videos are the two most common problematic issues related to video processing algorithms. Most of the existing methods mainly focused on increasing accuracy by exploring consecutive frames, which is laborious and cannot be considered for real-time applications. Since videos are mostly stored and transmitted in compressed format, these kinds of videos are available on many devices. Compressed videos contain a multitude of beneficial information, such as motion vectors and quantized coefficients. Proper use of this available information can greatly improve the video understanding methods' performance. This paper presents an approach for using residual data, available in compressed videos directly, which can be obtained by a light partially decoding procedure. In addition, a method for accumulating similar residuals is proposed, which dramatically reduces the number of processed frames for action recognition. Applying neural networks exclusively for accumulated residuals in the compressed domain accelerates performance, while the classification results are highly competitive with raw video approaches.
翻译:由于广泛使用已安装的摄像机,以视频为基础的监测方法为诸如辅助生活等不同目的吸引了相当多的注意力。时间冗余和原始视频的庞大规模是与视频处理算法有关的两个最常见的最常见问题。大多数现有方法主要侧重于通过探索连续框架提高准确性,这些框架是艰苦的,不能考虑实时应用。由于视频大多以压缩格式存储和传输,这些类型的视频可以在许多设备上找到。压缩视频包含许多有益的信息,如运动矢量和量化系数。适当使用这些现有信息可以大大改进视频理解方法的性能。本文介绍了一种使用残余数据的方法,这些数据直接以压缩视频形式提供,可通过轻度部分解码程序获取。此外,还提出了积累类似残余数据的方法,这大大减少了经过处理的行动识别框架的数量。将神经网络专门用于压缩域内积累的残余物加速了性能,而分类结果则与原始视频方法具有高度竞争力。