分解、压缩和基于合成的视频编码:通过基于参考的超级分辨率实现神经法 (Decomposition, Compression, and Synthesis Based Video Coding: A Neural Approach Through Reference-Based Super Resolution)

In pursuit of higher compression efficiency, a potential solution is the Down-Sampling based Video Coding (DSVC) where a input video is first downscaled for encoding at a relatively lower resolution, and then decoded frames are super-resolved through deep neural networks (DNNs). However, the coding gains are often bounded due to either uniform resolution sampling induced severe loss of high-frequency component, or insufficient information aggregation across non-uniformly sampled frames in existing DSVC methods. To address this, we propose to first decompose the input video into respective spatial texture frames (STFs) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMFs) at a lower spatial resolution that retain the motion smoothness; then compress them together using any popular video coder; and finally synthesize decoded STFs and TMFs for high-fidelity video reconstruction at the same resolution as its native input. This work simply applies the bicubic sampling in decomposition and Versatile Video Coding (VVC) compliant codec in compression, and puts the focus on the synthesis part. Such cross-resolution synthesis can be facilitated by Reference-based Super-Resolution (RefSR). Specifically, a motion compensation network (MCN) is devised on TMFs to efficiently align and aggregate temporal motion features that will be jointly processed with corresponding STFs using a texture transfer network (TTN) to better augment spatial details, by which the compression and resolution re-sampling noises can be effectively alleviated with better rate-distortion (R-D) efficiency, etc.

翻译：为了追求更高的压缩效率,一个潜在的解决方案是基于下标的基于下标的视频静态编码(DSVC ), 输入视频首先在相对较低的分辨率下降为编码, 然后解码框架通过深神经网络(DNN) 实现超级解析。然而, 编码收益往往被约束, 原因要么是统一分辨率取样导致高频组件严重丢失, 要么是现有DSVC方法中非统一抽样框架的信息汇总不足。为了解决这个问题, 我们提议首先将输入视频分解成各自空间纹理框架(STFs), 其本地总体空间分辨率分辨率为保存丰富的空间细节, 而其他时间运动框架则通过较低的空间分辨率(TMF ), 保持运动的平滑动; 然后使用流行的视频编码器把它们一起拼凑在一起; 最后将解码的STF和TMF作为高纤维视频重建的原版图。为了解决这个问题,我们只需在解析和VC VRC 递增(VC) 的缩缩缩缩缩缩缩图解算图中, 和缩图解解解解路路路路段的缩能能让更清晰的缩缩缩化系统(SMFBLILUDLILILBILB), 和BLBLBLUDMLBILBLBLBID 和BID 的缩缩缩缩缩为BLBID 和BLBID 。