Spatial self-attention layers, in the form of Non-Local blocks, introduce long-range dependencies in Convolutional Neural Networks by computing pairwise similarities among all possible positions. Such pairwise functions underpin the effectiveness of non-local layers, but also determine a complexity that scales quadratically with respect to the input size both in space and time. This is a severely limiting factor that practically hinders the applicability of non-local blocks to even moderately sized inputs. Previous works focused on reducing the complexity by modifying the underlying matrix operations, however in this work we aim to retain full expressiveness of non-local layers while keeping complexity linear. We overcome the efficiency limitation of non-local blocks by framing them as special cases of 3rd order polynomial functions. This fact enables us to formulate novel fast Non-Local blocks, capable of reducing the complexity from quadratic to linear with no loss in performance, by replacing any direct computation of pairwise similarities with element-wise multiplications. The proposed method, which we dub as "Poly-NL", is competitive with state-of-the-art performance across image recognition, instance segmentation, and face detection tasks, while having considerably less computational overhead.
翻译:以非本地区块形式出现的空间自控层,以非本地区块的形式,在进化神经网络中引入长距离依赖性,对所有可能的位置进行配对的相似性进行计算。这些对称功能是非本地层有效性的基础,但也决定了在空间和时间上对输入大小进行四进制的复杂性。这是一个严重限制因素,实际上妨碍非本地区块对即使是中等规模的投入的适用性。以前的工作重点是通过修改基本矩阵操作来降低复杂性,但在这项工作中,我们的目标是保持非本地层的完全直观性,同时保持复杂性线性。我们克服了非本地区块的效率限制,将非本地区块描述为第三顺序多元功能的特殊案例。这一事实使我们能够制定新的快速非本地区块,能够将复杂性从四进制到线性,不造成性损失,用元素错误的倍增法取代任何直接的相近性计算。我们称之为“Poly-NL”的拟议方法,与图像识别、例分解、面和探测任务之间的状态性能相当低。