As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Our codes are available at: https://github.com/coulsonlee/STDO-CVPR2023.git
翻译:由于深相神经网络(DNN)被广泛用于计算机视觉的各个领域,利用DNN超配能力实现视频分辨率升级已成为现代视频传输系统的新趋势。通过将视频分为块块和以超级分辨率模型超配每个块,服务器在将视频编码后再将其传送给客户,从而实现更好的视频质量和传输效率。然而,许多块预计将确保高超配质量,从而大大增加存储量并消耗更多的带宽资源进行数据传输。另一方面,通过培训优化技术减少块数通常需要高模型能力,这大大减缓了执行速度。为了调和,我们提出了一个高质量的高质量视频解决方案升级任务的新方法,利用空间时间信息将视频分为块,从而将块数与模型的大小保持最小化。此外,我们通过数据更新联合培训技术将我们的方法升级为单一的超配模型,进一步降低存储速度要求,从而大大降低执行速度。我们用微量质量的图像模型在高分辨率上展示了我们的视频模型。我们用高分辨率的图像模型在高分辨率上展示了一种状态。</s>