Attention mechanisms have become of crucial importance in deep learning in recent years. These non-local operations, which are similar to traditional patch-based methods in image processing, complement local convolutions. However, computing the full attention matrix is an expensive step with a heavy memory and computational load. These limitations curb network architectures and performances, in particular for the case of high resolution images. We propose an efficient attention layer based on the stochastic algorithm PatchMatch, which is used for determining approximate nearest neighbors. We refer to our proposed layer as a "Patch-based Stochastic Attention Layer" (PSAL). Furthermore, we propose different approaches, based on patch aggregation, to ensure the differentiability of PSAL, thus allowing end-to-end training of any network containing our layer. PSAL has a small memory footprint and can therefore scale to high resolution images. It maintains this footprint without sacrificing spatial precision and globality of the nearest neighbours, which means that it can be easily inserted in any level of a deep architecture, even in shallower levels. We demonstrate the usefulness of PSAL on several image editing tasks, such as image inpainting and image colorization.
翻译:近年来,在深层学习中,关注机制变得至关重要。这些与图像处理中传统的基于补丁的方法相似的非本地操作,补充了本地演化。然而,计算完全关注矩阵是一个昂贵的一步,内存和计算负荷沉重。这些限制限制了网络架构和性能,特别是高分辨率图像的网络架构和性能。我们建议基于用于确定近邻的随机算法PatchMatch的高效关注层。我们称我们拟议的层为“基于批量的斯托切关注层 ” ( PSAL ) 。此外,我们提出基于补丁汇总的不同方法,以确保PSAL 的可差异性,从而允许对包含我们层的任何网络进行端到端的培训。 PSAL 拥有一个小的记忆足迹,因此可以比高分辨率图像大。我们建议以不牺牲最近的邻居的空间精确性和全球性来保持这一足迹,这意味着它可以很容易地插入任何深度结构的层次,甚至更浅层。我们展示了PSAL 的实用性,例如图像成像和彩色化。