Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of $L_p$ regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.
翻译:神经图像压缩已超过最先进的传统调试功能( H.266/VVC), 超过高调调试( RD) 的高级传统调试功能( H. 266/ VVC), 但也存在巨大的复杂性和不同的不同调试交易模式。 在本文中, 我们提出一个高效的单一模型变量比特率调制( EVC ), 它能够以 768x512 的输入图像运行到 30 FPS, 并且仍然优于 RD 性能的 VVC 。 通过进一步降低调试和解调的复杂性能, 我们的小模型甚至达到了 30 FPS, 并且输入了 1920x1080 的图像。 为了缩小我们不同能力模型之间的性能差距, 我们精心设计了面具变异模型, 将大型模型的变形模型自动转换成小模型 。 并且提出了新的调控性能损失来减轻 $L_ p MAD 的缺陷。 我们的中小模型的算法大大缩小了50% 和30% 。 最后, 我们的变压的图像压缩的变压 。