This paper proposes Omnidirectional Representations from Transformers (OmniNet). In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network. This process can also be interpreted as a form of extreme or intensive attention mechanism that has the receptive field of the entire width and depth of the network. To this end, the omnidirectional attention is learned via a meta-learner, which is essentially another self-attention based model. In order to mitigate the computationally expensive costs of full receptive field attention, we leverage efficient self-attention models such as kernel-based (Choromanski et al.), low-rank attention (Wang et al.) and/or Big Bird (Zaheer et al.) as the meta-learner. Extensive experiments are conducted on autoregressive language modeling (LM1B, C4), Machine Translation, Long Range Arena (LRA), and Image Recognition. The experiments show that OmniNet achieves considerable improvements across these tasks, including achieving state-of-the-art performance on LM1B, WMT'14 En-De/En-Fr, and Long Range Arena. Moreover, using omnidirectional representation in Vision Transformers leads to significant improvements on image recognition tasks on both few-shot learning and fine-tuning setups.
翻译:本文建议由变异器(OmniNet) 提供全向性代表。 在OmniNet中, 我们不保留一个严格横向的可接受字段, 允许每个象征体在整个网络中关注所有象征物。 这个过程也可以被解释为一种极端或密集的注意机制, 它具有网络整个宽度和深度的可接受领域。 为此, 通过一个元 Learner 来学习全向性关注, 这基本上是另一个基于自我注意的模式。 为了减少完全接受的实地关注的计算成本, 我们利用高效的自我注意模式, 如以内核为基础的(Choromanski et al.), 低级别关注(Wang et al.) 和/或大Bird (Zaheer et al.) 。 为了这个目的, 对自动递增语言模型(LM1B、C4、机器翻译、长距离(LART) 和图像识别。 实验显示, OmniNet在这些任务中取得了相当大的改进, 包括利用LM-Rain-F imal-Developal Tal imal imal imal imal imal imal imal- imal-deal real deal development lavial sal supal supal 14, 在LMisal sal- laveal sal sal sal- imal- laveal- laviewal- sal- salmental sessional sessionalmental supal supal sal salmentalmentalmental sessmental sal salmentalmentalmental 上实现LM1和LM1和LM1和LM1和LM1和LM1和LM1和LM1和LM1和LM1和LM1 IMSIMSIMSIMSIMSIMSIMSIM1和SIM1Bal IM1和SIM1Bal IM1和SIM1和SDSDSIMSIM1 IM1 IM1 IM1 IM1 IM1 IM1 IM1 IM1 IM1 IM1 IM1 IM1 IMSIMSIMSIMS