Neural data compression has been shown to outperform classical methods in terms of $RD$ performance, with results still improving rapidly. At a high level, neural compression is based on an autoencoder that tries to reconstruct the input instance from a (quantized) latent representation, coupled with a prior that is used to losslessly compress these latents. Due to limitations on model capacity and imperfect optimization and generalization, such models will suboptimally compress test data in general. However, one of the great strengths of learned compression is that if the test-time data distribution is known and relatively low-entropy (e.g. a camera watching a static scene, a dash cam in an autonomous car, etc.), the model can easily be finetuned or adapted to this distribution, leading to improved $RD$ performance. In this paper we take this concept to the extreme, adapting the full model to a single video, and sending model updates (quantized and compressed using a parameter-space prior) along with the latent representation. Unlike previous work, we finetune not only the encoder/latents but the entire model, and - during finetuning - take into account both the effect of model quantization and the additional costs incurred by sending the model updates. We evaluate an image compression model on I-frames (sampled at 2 fps) from videos of the Xiph dataset, and demonstrate that full-model adaptation improves $RD$ performance by ~1 dB, with respect to encoder-only finetuning.
翻译:神经数据压缩显示在美元性能方面优于经典方法,结果仍然在迅速改善。在高水平上,神经压缩基于一个自动编码器,试图从(量化的)潜在代表面中重建输入实例,加上一个用于无损压缩这些潜值的先前版本。由于模型能力的限制以及不完善的优化和概括化,这些模型将一般地低于最优化的压缩测试数据。然而,所学压缩的巨大优势之一是,如果测试时间数据分布为已知的,而且相对较少的适应性(例如,一个监视静态场景的相机,一个自动汽车的破碎摄像头等等),模型可以很容易地根据这种分布对输入实例进行微调或调整,从而导致提高美元性能。在本文中,我们把这个概念推到极限,将整个模型调整成单一的视频,并发送模型更新(使用参数-空间之前的微调和压缩 ) 与潜值代表面值的调整相比,我们不仅对模型进行微调,而且对整个模型进行微调,还要对整个模型进行微量度的图像进行微调,在模型中进行微调,并进行更新。 在微调时,在模型中,我们对模型进行微调时,对模型进行微调,对模型的模型进行微调,对模型进行微调。