Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are limiting as they do not explicitly exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. This design shares computation within each group, in the spatial and temporal dimensions, resulting in reduced encoding time of the video. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To further enhance efficiency, we perform quantization of the network parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12X, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm is highly flexible and scales naturally due to its patch-wise and autoregressive designs. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA achieves 6X decoding speed and scales well with more GPUs, making it practical for various deployment scenarios.
翻译:最近,NIRVANA(NIRVANA)显示,它是高质量视频压缩的强大工具。然而,现有的作品正在限制,因为它们没有明确利用视频中的时间冗余,导致长时间编码时间过长。此外,这些方法有固定的结构结构,不比长视频或更高分辨率。为了解决这些问题,我们建议NIRVANANA(NIRVANA)将视频作为一组框架处理,并适合每个群体进行补足性预测的单独网络。这个设计共享在空间和时间层面各组内部进行计算,从而缩短视频的编码时间。视频显示是自动制成的,其网络在目前组中以初始化的速度使用前一组模式的重量。为了进一步提高效率,我们在培训期间对网络参数进行四分化,不需要再做更长时间的视频调整或再加分解。与以前关于基准UVG数据集的工作相比,NIRVANA质量从37.36提高到37.70(PSNR)和编码速度由12X制成,同时保持同样的压缩速度,同时保持压缩速度。为了进一步提高效率,我们比以往的磁带调整和不断升级的图像升级,我们比比级的图像,我们更能更能更接近地显示我们更接近的升级的图像比级的图像,我们更接近于升级的升级的图像,我们更接近于升级的升级的升级的升级的升级的图像,我们更接近于升级和升级的升级的升级的升级到升级的升级的升级的升级的升级的升级的图像,我们更接近。