Image inpainting is an underdetermined inverse problem, which naturally allows diverse contents to fill up the missing or corrupted regions realistically. Prevalent approaches using convolutional neural networks (CNNs) can synthesize visually pleasant contents, but CNNs suffer from limited perception fields for capturing global features. With image-level attention, transformers enable to model long-range dependencies and generate diverse contents with autoregressive modeling of pixel-sequence distributions. However, the unidirectional attention in autoregressive transformers is suboptimal as corrupted image regions may have arbitrary shapes with contexts from any direction. We propose BAT-Fill, an innovative image inpainting framework that introduces a novel bidirectional autoregressive transformer (BAT) for image inpainting. BAT utilizes the transformers to learn autoregressive distributions, which naturally allows the diverse generation of missing contents. In addition, it incorporates the masked language model like BERT, which enables bidirectionally modeling of contextual information of missing regions for better image completion. Extensive experiments over multiple datasets show that BAT-Fill achieves superior diversity and fidelity in image inpainting qualitatively and quantitatively.
翻译:图像映射是一个未下定的反向问题, 它自然允许不同内容以现实的方式填充缺失或腐败的区域。 使用进化神经网络( CNNs) 的先导方法可以将视觉上令人愉快的内容合成, 但CNN在捕捉全球特征的感知领域上却受限制。 在图像层面的注意下, 变压器能够模拟长距离依赖性, 产生多种内容, 并自动递增像素序列分布的模型。 但是, 自动递减变异变异器中的单向性关注是次优化的, 因为腐败的图像区域可能有任意的形状。 我们提议使用 BAT- Fill, 这是一种创新的画图框架, 引入新的双向自动递增变异变变变器( BAT) 用于映射。 BAT 利用变异器学习自动递增的分布, 这自然允许各种缺失的内容的生成。 此外, 它包含隐蔽语言模型, 因为腐败的图像区域可能有任意的形状。 我们提议采用 BAT- Fill- Filling alibly ex exalblovealalalalal 实验, 并显示在图像完成中的高级图像中, 高级图像中, 高级图像中, 高级实验显示BIalbrealpalbalbalbsetalbsetalbs