Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is re-weighted and aggregated, but never filtered out. Here, we introduce a new approach for hard attention and find it achieves very competitive performance on a recently-released visual question answering datasets, equalling and in some cases surpassing similar soft attention architectures while entirely ignoring some features. Even though the hard attention mechanism is thought to be non-differentiable, we found that the feature magnitudes correlate with semantic relevance, and provide a useful signal for our mechanism's attentional selection criterion. Because hard attention selects important features of the input information, it can also be more efficient than analogous soft attention mechanisms. This is especially important for recent approaches that use non-local pairwise operations, whereby computational and memory costs are quadratic in the size of the set of features.
翻译:生物认知的注意机制被认为是为更复杂的处理选择概念信息子集,而这种子集对于所有感官投入来说是难以做到的。然而,在计算机的视觉中,人们相对较少地探索难以引起注意的问题,尽管一些信息是软关注的成功,尽管信息是经过重新加权和汇总的,但从未过滤出来。在这里,我们引入了一种新的方法来引起注意,发现它在最近发行的视觉问题解答数据集上取得了非常有竞争力的性能,这等于或在某些情况下超过了类似的软关注结构,而完全忽略了某些特征。尽管人们认为硬关注机制是不可区分的,但我们发现其特点与语义相关性有关,并为我们机制的注意力选择标准提供了有用的信号。由于硬关注选择了投入信息的重要特征,因此它也比类似的软关注机制更为有效。这对于最近采用的方法尤为重要,即使用非本地对口操作,即计算和记忆成本在地段大小上是四重置。