Visual attention helps achieve robust perception under noise, corruption, and distribution shifts in human vision, which are areas where modern neural networks still fall short. We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity. Related features are grouped together via recurrent connections between neurons, with salient objects emerging via sparse regularization. VARS adopts an attractor network with recurrent connections that converges toward a stable pattern over time. Network layers are represented as ordinary differential equations (ODEs), formulating attention as a recurrent attractor network that equivalently optimizes the sparse reconstruction of input using a dictionary of "templates" encoding underlying patterns of data. We show that self-attention is a special case of VARS with a single-step optimization and no sparsity constraint. VARS can be readily used as a replacement for self-attention in popular vision transformers, consistently improving their robustness across various benchmarks. Code is released on GitHub (https://github.com/bfshi/VARS).
翻译:视觉关注有助于在噪音、腐败和人类视觉分布变化下实现强力感知,这些方面是现代神经网络仍然落后的领域。我们介绍了VARS,《经常斯帕西重建的视觉关注》,这是以人类视觉关注机制的两个突出特征为基础的一种新的关注配方:货币再现和宽度。相关特征通过神经元之间的反复连接将相关特征组合在一起,通过稀疏的正规化产生突出的物体。VARS采用一个吸引者网络,其经常连接会逐渐走向稳定模式。网络层作为普通的差别方程式(ODEs),将关注作为经常性吸引者网络,利用“模板”编码数据模式的词典来优化微量重整投入。我们表明,自我关注是VARS的一个特殊案例,具有单步优化和无宽度限制。VARS可以随时用于替代大众视觉变异器的自我保持,并不断提高各种基准的稳健性。代码在GitHubb(https://github.com/bfshi/VARSS)上发布。