One critical component in lossy deep image compression is the entropy model, which predicts the probability distribution of the quantized latent representation in the encoding and decoding modules. Previous works build entropy models upon convolutional neural networks which are inefficient in capturing global dependencies. In this work, we propose a novel transformer-based entropy model, termed Entroformer, to capture long-range dependencies in probability distribution estimation effectively and efficiently. Different from vision transformers in image classification, the Entroformer is highly optimized for image compression, including a top-k self-attention and a diamond relative position encoding. Meanwhile, we further expand this architecture with a parallel bidirectional context model to speed up the decoding process. The experiments show that the Entroformer achieves state-of-the-art performance on image compression while being time-efficient.
翻译:丢失深度图像压缩中的一个关键要素是英特罗比模型,该模型预测了在编码和解码模块中量化潜在代表的概率分布。 先前的工程在控制全球依赖性方面效率低下的进化神经网络上构建了进化模型。 在这项工作中,我们提出了一个新型的基于变压器的英特罗比模型,名为Entrofrent, 以有效和高效地捕捉概率分布估计方面的远程依赖性。 与图像分类中的视觉变异器不同, 英特罗因图像压缩而高度优化, 包括一个顶级自控和钻石相对位置编码。 与此同时, 我们进一步扩展这一结构, 以平行的双向环境模型加速解码进程。 实验显示, 英特罗克在时间有效的情况下实现了图像压缩的艺术状态。