Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}.
翻译:在变压器结构中, 定位编码最近显示在变压器结构中有效。 它允许对序列不同位置各元素之间的依赖性建模进行有价值监督。 在本文中, 我们首先调查将定位信息纳入变压器语言模型学习过程的各种方法。 然后, 我们提出名为“ 扶轮定位嵌入( ROPE) ” 的新颖方法, 以有效地利用定位信息。 具体地说, 拟议的 RoPE 将绝对位置用旋转矩阵编码, 同时在自省配方中包含明显的相对位置依赖性。 值得注意的是, RoFormer 启用了有价值的属性, 包括序列长度的灵活性, 相对距离越来越长, 以及用相对位置编码装备线性自控能力。 最后, 我们用旋转位置嵌入( 也称为 RoFormer ) 来评估各种长文本分类基准数据集的增强型变压器。 我们的实验显示, 它始终克服了其它选择。 此外, 我们提供理论分析来解释一些实验结果 。 Roformer 已经融入了Huggingface : https://hughingface 。