Recently, learned image compression has achieved remarkable performance. Entropy model, which accurately estimates the distribution of latent representation, plays an important role in boosting rate distortion performance. Most entropy models capture correlations in one dimension. However, there are channel-wise, local and global spatial correlations in latent representation. To address this issue, we propose multi-reference entropy models MEM and MEM+ to capture channel, local and global spatial contexts. We divide latent representation into slices. When decoding current slice, we use previously decoded slices as contexts and use attention map of previously decoded slice to predict global correlations in current slice. To capture local contexts, we propose enhanced checkerboard context capturing to avoid performance degradation while retaining two-pass decoding. Based on MEM and MEM+, we propose image compression models MLIC and MLIC+. Extensive experimental evaluations have shown that our MLIC and MLIC+ achieve state-of-the-art performance and they reduce BD-rate by 9.77% and 13.09% on Kodak dataset over VVC when measured in PSNR.
翻译:最近,学习到的图像压缩取得了显著的绩效。 精确估计潜在代表面分布的 Etropy 模型在提高率扭曲性能方面起着重要作用。 多数的 entropy 模型在一个维度上捕捉了相关关系。 但是,在潜在代表度方面存在着通道、 本地和全球的空间相关性。 为了解决这一问题, 我们提出了多参考的 entropy 模型MEM 和 MEM+ 来捕捉通道、 本地和全球空间背景。 我们将潜在代表度分为切片。 当解码当前切片时, 我们使用以前解码的切片作为背景, 并使用先前解码切片的注意图来预测当前切片的全球相关性。 为了捕捉当地环境, 我们建议加强检查板环境, 以避免性能退化, 同时保留双向解码。 根据 MEM 和 MEM+, 我们提出了图像压缩模型 MIC 和 MLIC+ 以捕捉到频道、 和 MLIC+ 的广泛实验性评估显示, 我们的MLIC 和 MLIC 实现了最先进的性性性能,, 当在PSNR 中测量时, Kodakdak 数据设置上的BD- 降为9.77%和 13.09%和 13.%。