Recent studies have shown the importance of modeling long-range interactions in the inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution in consideration of computational cost. In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. We carefully design each component of our framework to guarantee the high fidelity and diversity of recovered images. Specifically, we customize an inpainting-oriented transformer block, where the attention module aggregates non-local information only from partial valid tokens, indicated by a dynamic mask. Extensive experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets. Code is released at https://github.com/fenglinglwb/MAT.
翻译:最近的研究显示,在油漆问题中建模长距离互动的重要性。为了实现这一目标,现有办法利用独立关注技术或变压器,但通常在计算成本方面采用低分辨率。在本文中,我们提出了一个基于变压器的新型大洞油漆模型,该模型统一了变压器和变速器的优点,以便有效地处理高分辨率图像。我们仔细设计了我们框架的每个组成部分,以保证回收图像的高度忠诚性和多样性。具体地说,我们定制了一个面向油漆的变压器块,注意模块只能从部分有效符号中收集非本地信息,用动态掩体表示。广泛的实验展示了多个基准数据集新模型的最新性能。代码在 https://github.com/fenglinglwb/MAT上发布。