Attention is a powerful component of modern neural networks across a wide variety of domains. In this paper, we seek to quantify the regularity (i.e. the amount of smoothness) of the attention operation. To accomplish this goal, we propose a new mathematical framework that uses measure theory and integral operators to model attention. We show that this framework is consistent with the usual definition, and that it captures the essential properties of attention. Then we use this framework to prove that, on compact domains, the attention operation is Lipschitz continuous and provide an estimate of its Lipschitz constant. Additionally, by focusing on a specific type of attention, we extend these Lipschitz continuity results to non-compact domains. We also discuss the effects regularity can have on NLP models, and applications to invertible and infinitely-deep networks.
翻译:关注是现代神经网络在广泛领域的一个强大组成部分。 在本文中, 我们试图量化关注行动的规律性( 即平稳程度 ) 。 为了实现这一目标, 我们提出一个新的数学框架, 使用测量理论和整体操作者来模拟关注。 我们显示这个框架符合通常的定义, 并包含关注的基本特性 。 然后我们用这个框架来证明, 在紧凑的域上, 关注的操作是连续的 Lipschitz, 并且提供其 Lipschitz 常数的估计值 。 此外, 我们通过关注特定类型的关注, 将这些 Lipschitz 的连续性结果推广到非兼容性域 。 我们还讨论了常规性对 NLP 模型的影响, 以及应用到不可忽略和无限深的网络 。