With advances in image recognition technology based on deep learning, automatic video analysis by Artificial Intelligence is becoming more widespread. As the amount of video used for image recognition increases, efficient compression methods for such video data are necessary. In general, when the image quality deteriorates due to image encoding, the image recognition accuracy also falls. Therefore, in this paper, we propose a neural-network-based approach to improve image recognition accuracy, especially the object detection accuracy by applying post-processing to the encoded video. Versatile Video Coding (VVC) will be used for the video compression method, since it is the latest video coding method with the best encoding performance. The neural network is trained using the features of YOLO-v7, the latest object detection model. By using VVC as the video coding method and YOLO-v7 as the detection model, high object detection accuracy is achieved even at low bit rates. Experimental results show that the combination of the proposed method and VVC achieves better coding performance than regular VVC in object detection accuracy.
翻译:随着基于深度学习的图像识别技术的进步,基于人工智能的自动视频分析变得越来越普遍。由于用于图像识别的视频量增加,因此需要有效的视频数据压缩方法。通常,由于图像编码而导致图像质量下降时,图像识别精度也会下降。因此,在本文中,我们提出了一种基于神经网络的方法,通过对编码视频进行后处理,特别是针对目标检测精度来提高图像识别精度。通用视频编码(VVC)将用于视频压缩方法,因为它是具有最佳编码性能的最新视频编码方法。神经网络使用YOLO-v7的特征进行训练,YOLO-v7是最新的目标检测模型。通过使用VVC作为视频编码方法和YOLO-v7作为检测模型,即使在低比特率下也可以实现高物体检测精度。实验结果表明,所提出的方法与VVC相结合在物体检测精度上实现了比常规VVC更好的编码性能。