Bicubic downscaling is a prevalent technique used to reduce the video storage burden or to accelerate the downstream processing speed. However, the inverse upscaling step is non-trivial, and the downscaled video may also deteriorate the performance of downstream tasks. In this paper, we propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously. During the training, we decrease the entropy of the information lost in the downscaling by maximizing its probability conditioned on the strong spatial-temporal prior information within the downscaled video. After optimization, the downscaled video by our framework preserves more meaningful information, which is beneficial for both the upscaling step and the downstream tasks, e.g., video action recognition task. We further extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed for the end-to-end training of the whole system. Extensive experimental results demonstrate the superiority of our approach on video rescaling, video compression, and efficient action recognition tasks.
翻译:在本文中,我们提议一个自设的概率框架,以同时学习配对降缩缩放和升缩缩缩放程序。在培训期间,我们减少降缩缩缩缩放中丢失的信息的变速率,办法是尽量扩大降缩缩缩放视频中以强力的空间时空先前信息为条件的概率。在优化后,我们框架降缩缩缩缩缩缩放视频保留了更有意义的信息,这有利于提升降缩缩缩缩放步骤和下游任务,例如视频动作识别任务。我们进一步将这一框架扩展至一个丢失的视频压缩系统,其中为整个系统的终端至终端培训提议了一个非差异性工业损失代码的梯度估计仪。在优化后,我们框架的降缩放视频保存了更有意义的信息,这有利于提高降缩放步骤和下游任务,例如视频动作识别任务。我们进一步将这一框架扩大到一个丢失的视频压缩系统,为整个系统的终端至终端培训提出一个梯度测算器。