Transformer is a powerful tool for many natural language tasks which is based on self-attention, a mechanism that encodes the dependence of other tokens on each specific token, but the computation of self-attention is a bottleneck due to its quadratic time complexity. There are various approaches to reduce the time complexity and approximation of matrix is one such. In Nystr\"omformer, the authors used Nystr\"om based method for approximation of softmax. The Nystr\"om method generates a fast approximation to any large-scale symmetric positive semidefinite (SPSD) matrix using only a few columns of the SPSD matrix. However, since the Nystr\"om approximation is low-rank when the spectrum of the SPSD matrix decays slowly, the Nystr\"om approximation is of low accuracy. Here an alternative method is proposed for approximation which has a much stronger error bound than the Nystr\"om method. The time complexity of this same as Nystr\"omformer which is $O\left({n}\right)$.
翻译:变换器是许多自然语言任务的一个强大工具, 它基于自我注意, 一种机制将其他象征的依附编码于每个特定象征, 但自我注意的计算是一个瓶颈, 因为它具有四面形的时间复杂性。 有多种方法可以降低时间复杂性, 矩阵的近似性就是这样。 在 Nystr\\'omformert 中, 作者使用基于 Nystr\\'om 的基于“ om ” 方法来近似软体。 Nystr\\\\'om 方法使任何大规模正对称半成( SPSD) 矩阵快速接近任何大型正对称半成像( SPSD) 矩阵中的几列。 但是, 由于 Nystr\\\'om 近似值在 SPSD 矩阵的频谱慢慢衰减时是低的,, Nystr\\\\'ompoint是低精度的。 。 在这里, 一种替代方法是比 Nystr\\\\\\'om om 方法更强的错误约束性。 。 。 这个方法的时间复杂性与 Nystr\\\\\\\\\\\ omfrefent 相同, 相同, 是 $O\\\\\\\\\\ nn\\\\\\\ nn\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\