Attention mechanisms have become of crucial importance in deep learning in recent years. These non-local operations, which are similar to traditional patch-based methods in image processing, complement local convolutions. However, computing the full attention matrix is an expensive step with heavy memory and computational loads. These limitations curb network architectures and performances, in particular for the case of high resolution images. We propose an efficient attention layer based on the stochastic algorithm PatchMatch, which is used for determining approximate nearest neighbors. We refer to our proposed layer as a "Patch-based Stochastic Attention Layer" (PSAL). Furthermore, we propose different approaches, based on patch aggregation, to ensure the differentiability of PSAL, thus allowing end-to-end training of any network containing our layer. PSAL has a small memory footprint and can therefore scale to high resolution images. It maintains this footprint without sacrificing spatial precision and globality of the nearest neighbors, which means that it can be easily inserted in any level of a deep architecture, even in shallower levels. We demonstrate the usefulness of PSAL on several image editing tasks, such as image inpainting, guided image colorization, and single-image super-resolution. Our code is available at: https://github.com/ncherel/psal
翻译:近些年来,在深层学习中,关注机制变得至关重要。这些与图像处理中传统的基于补丁的方法相似的非本地操作,补充了本地的演进。然而,计算完全关注矩阵是一个昂贵的步伐,内存和计算负荷沉重。这些限制限制了网络架构和性能,特别是高分辨率图像的网络架构和性能。我们建议基于用于确定近邻的随机算法PatchMatch的高效关注层。我们称我们提议的层为“基于批量的斯托切视层 ” ( PSAL ) 。此外,我们提出基于补丁汇总的不同方法,以确保PSAL 的可差异性,从而允许对包含我们层的任何网络进行端到端的培训。 PSAL 拥有一个小的记忆足迹,因此可以对高分辨率图像进行规模。我们在不牺牲近邻的空间精确性和全球性的情况下维持了这一足迹,这意味着可以很容易地将其插入任何深层结构,甚至更浅的层结构中。我们展示了PSAL对若干图像编辑任务的实用性,例如图像的可使用性,例如:在图像中绘制/制成的单一分辨率。