While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present recent neural video codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on full-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitrate versions of our algorithms. Since our improvements are compatible with a large class of existing models, we provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
翻译:虽然最近的机器学习研究揭示了深层基因模型(如VAEs)和在学习压缩中使用的率扭曲损失之间的联系,但大多数这项工作都侧重于图像。本着类似精神,我们从深层自回归和潜伏变量模型的镜像中看到了最近提出的神经视频编码算法。我们将最近的神经视频编码作为普遍随机暂时自回归变异的实例提出,并提出了通过正常流动和结构化前科来进一步改进的新途径。我们建议了一些在全分辨率视频上产生最先进的视频压缩性能的架构,并讨论了它们的偏差和折流。特别是,我们建议(一) 改进时间自回归变,(二) 改进结构性和时间依赖性的诱变模型,(三) 我们算法的可变比特法版本。由于我们的改进与大量现有模型相容,我们提供了进一步的证据,说明型模型观点可以推进神经视频编码领域。