It is well noted that coordinate based MLPs benefit greatly -- in terms of preserving high-frequency information -- through the encoding of coordinate positions as an array of Fourier features. Hitherto, the rationale for the effectiveness of these positional encodings has been solely studied through a Fourier lens. In this paper, we strive to broaden this understanding by showing that alternative non-Fourier embedding functions can indeed be used for positional encoding. Moreover, we show that their performance is entirely determined by a trade-off between the stable rank of the embedded matrix and the distance preservation between embedded coordinates. We further establish that the now ubiquitous Fourier feature mapping of position is a special case that fulfills these conditions. Consequently, we present a more general theory to analyze positional encoding in terms of shifted basis functions. To this end, we develop the necessary theoretical formulae and empirically verify that our theoretical claims hold in practice. Codes available at https://github.com/osiriszjq/Rethinking-positional-encoding.
翻译:众所周知,基于协调的 MLP 通过将坐标位置编码成一连串的Fourier 特征,使基于协调的 MLP 大大受益 -- -- 在保存高频信息方面 -- -- 通过将坐标位置编码为一连串的Fourier 特征。 在此之前,这些位置编码有效性的理由完全通过Fourier 的透镜来研究。 在本文中,我们努力扩大这一认识,显示其他非四级嵌入功能确实可用于定位编码。 此外,我们表明,它们的性能完全取决于嵌入式矩阵稳定级别与嵌入坐标之间的距离保护之间的权衡。 我们还进一步确定,现在普遍存在的四级定位特征映射是一个满足这些条件的特殊案例。 因此,我们提出了一个更笼统的理论,用改变的基础功能来分析位置编码。 为此,我们开发了必要的理论公式,并从经验上核实我们的理论主张在实践中是否有效。 代码可在 https://github.com/osirsjq/Re thinking-posial-encoding 中查阅。