Convolutional neural networks (CNNs) have been observed to be inefficient in propagating information across distant spatial positions in images. Recent studies in image inpainting attempt to overcome this issue by explicitly searching reference regions throughout the entire image to fill the features from reference regions in the missing regions. This operation can be implemented as contextual attention layer (CA layer) \cite{yu2018generative}, which has been widely used in many deep learning-based methods. However, it brings significant computational overhead as it computes the pair-wise similarity of feature patches at every spatial position. Also, it often fails to find proper reference regions due to the lack of supervision in terms of the correspondence between missing regions and known regions. We propose a novel contextual reconstruction loss (CR loss) to solve these problems. First, a criterion of searching reference region is designed based on minimizing reconstruction and adversarial losses corresponding to the searched reference and the ground-truth image. Second, unlike previous approaches which integrate the computationally heavy patch searching and replacement operation in the inpainting model, CR loss encourages a vanilla CNN to simulate this behavior during training, thus no extra computations are required during inference. Experimental results demonstrate that the proposed inpainting model with the CR loss compares favourably against the state-of-the-arts in terms of quantitative and visual performance. Code is available at \url{https://github.com/zengxianyu/crfill}.
翻译:在图像中,最近对图像绘制图像的研究试图通过在整个图像中明确搜索参考区域以填补缺失区域参考区域的特征来克服这一问题。这项行动可以作为背景关注层(CA 层)\ cit {yu2018generative}来实施,在许多深层次的基于学习的方法中广泛使用。然而,它带来了大量的计算间接费用,因为它计算了每个空间位置的双轨式功能补丁的相近性。此外,由于对缺失区域和已知区域之间的通信缺乏监督,它往往未能找到适当的参考区域。我们建议为解决这些问题而进行新的背景重建损失(CR损失)。首先,搜索参考区域的标准是尽量减少与搜索参考和地面图象相对应的重建和对抗性损失。第二,它与以前将计算性重的补补补补补补差和替换功能纳入适应模型的方法不同, CR损失鼓励在虚拟CNN上模拟这种行为,因此,在模拟实验期间,不需要对实验性能进行额外的计算。