Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language processing and beyond. Here we investigate the fundamental building blocks of attention and their computational properties. Within the standard model of deep learning, we classify all possible fundamental building blocks of attention in terms of their source, target, and computational mechanism. We identify and study three most important mechanisms: additive activation attention, multiplicative output attention (output gating), and multiplicative synaptic attention (synaptic gating). The gating mechanisms correspond to multiplicative extensions of the standard model and are used across all current attention-based deep learning architectures. We study their functional properties and estimate the capacity of several attentional building blocks in the case of linear and polynomial threshold gates. Surprisingly, additive activation attention plays a central role in the proofs of the lower bounds. Attention mechanisms reduce the depth of certain basic circuits and leverage the power of quadratic activations without incurring their full cost.
翻译:关注在自然和人工智能系统中都起着根本作用。 在深层学习中,关注的神经结构,如变压器结构,被广泛用于解决自然语言处理和处理之外的问题。 我们在这里调查关注的基本构件及其计算特性。 在标准的深层学习模式中,我们将所有可能关注的基本构件按其来源、目标和计算机制进行分类。 我们发现并研究三个最重要的机制:添加活化注意、多复制性输出注意(输出引力)和多复制性合成注意(合成凝胶 ) 。 连接机制与标准模型的多复制性扩展相对应,并用于当前所有基于关注的深层学习结构。 我们研究它们的功能特性,并估计在线性和多元临界门的情况下几个关注构件的能力。 令人惊讶的是,添加性激发注意在下界的证据中起着核心作用。 注意机制会降低某些基本电路的深度,并在不承担全部成本的情况下利用二次激活的力量。