深视代表制学习的多样化结构关注网络 (Variational Structured Attention Networks for Deep Visual Representation Learning)

Convolutional neural networks have enabled major progresses in addressing pixel-level prediction tasks such as semantic segmentation, depth estimation, surface normal prediction and so on, benefiting from their powerful capabilities in visual representation learning. Typically, state of the art models integrate attention mechanisms for improved deep feature representations. Recently, some works have demonstrated the significance of learning and combining both spatial- and channelwise attentions for deep feature refinement. In this paper, weaim at effectively boosting previous approaches and propose a unified deep framework to jointly learn both spatial attention maps and channel attention vectors in a principled manner so as to structure the resulting attention tensors and model interactions between these two types of attentions. Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to VarIational STructured Attention networks (VISTA-Net). We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN frontend parameters. As demonstrated by our extensive empirical evaluation on six large-scale datasets for dense visual prediction, VISTA-Net outperforms the state-of-the-art in multiple continuous and discrete prediction tasks, thus confirming the benefit of the proposed approach in joint structured spatial-channel attention estimation for deep representation learning. The code is available at https://github.com/ygjwd12345/VISTA-Net.

翻译：电传神经网络使得在应对像素级预测任务方面取得重大进展,如语义分解、深度估计、表面正常预测等,并得益于其视觉代表学习的强大能力,在应对像素级预测任务方面取得重大进展。通常,先进模型的状况将关注机制整合在一起,以改善深度地貌表现;最近,一些工作表明学习和结合空间和渠道两方面的关注对于深层地貌改进的重要性。在本文件中,有效地推进以前的方法并提出一个统一的深层次框架,以原则方式共同学习空间关注地图和频道关注矢量,从而构建这两类关注之间的关注度和模型互动。具体地说,我们将各种关注的估算和互动纳入一个概率性代表学习框架,导致变异结构式调整关注网络(VISTA-Net);我们在神经网络中执行推断规则,从而能够从终端到终端学习稳妥性和CNN的前端值参数。正如我们对用于密集视觉预测的六种大规模数据集和模型之间的相互作用所显示的那样,在连续地图像预测中,VIST-Net 结构式的连续式空间代表方法将确认拟议的连续的连续的连续的学习规则。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日