标题：贪心排序变换器层权重矩阵可提高翻译效果摘要：在此之前，人们试图了解基于变换器的编码器-解码器架构的内部结构和功能，特别是关注了自注意力、交叉注意力和前馈子层的组合可能性。但是，如果不研究低级结构，就会对子层重新排序的动机所知甚少。我们是否可以深入到子层抽象中，通过按其Heavy-Tailed Self-Regularization（HT-SR）指标的训练水平贪心重排编码器中的层权重矩阵，并相应地排序解码器矩阵，以提高翻译质量？我们提出了AEIUOrder模型，通过把全部的传输学习模型放置到GCN中来实现交互和协作，从而实现了自我学习，提高了模型的环节重用性、可解释性和泛化性。我们结果表明，通过贪心地重新排列层权重矩阵来最大化总体的训练水平，有助于模型更有效地学习表示并生成翻译。 (Greedy Ordering of Layer Weight Matrices in Transformers Improves Translation)

翻译：标题：贪心排序变换器层权重矩阵可提高翻译效果摘要：在此之前，人们试图了解基于变换器的编码器-解码器架构的内部结构和功能，特别是关注了自注意力、交叉注意力和前馈子层的组合可能性。但是，如果不研究低级结构，就会对子层重新排序的动机所知甚少。我们是否可以深入到子层抽象中，通过按其Heavy-Tailed Self-Regularization（HT-SR）指标的训练水平贪心重排编码器中的层权重矩阵，并相应地排序解码器矩阵，以提高翻译质量？我们提出了AEIUOrder模型，通过把全部的传输学习模型放置到GCN中来实现交互和协作，从而实现了自我学习，提高了模型的环节重用性、可解释性和泛化性。我们结果表明，通过贪心地重新排列层权重矩阵来最大化总体的训练水平，有助于模型更有效地学习表示并生成翻译。

Elicia Ye

from arxiv, The paper contains an error in the implementation of the algorithm

Prior work has attempted to understand the internal structures and functionalities of Transformer-based encoder-decoder architectures on the level of multi-head attention and feed-forward sublayers. Interpretations have focused on the encoder and decoder, along with the combinatorial possibilities of the self-attention, cross-attention, and feed-forward sublayers. However, without examining the low-level structures, one gains limited understanding of the motivation behind sublayer reordering. Could we dive into the sublayer abstraction and permute layer weight matrices to improve the quality of translation? We propose AEIUOrder to greedily reorder layer weight matrices in the encoder by their well-trainedness, as measured by Heavy-Tailed Self-Regularization (HT-SR) metrics, and order the decoder matrices correspondingly. Our results suggest that greedily reordering layer weight matrices to maximize Total well-trainedness facilitates the model to learn representations and generate translations more effectively.

翻译：注意：在翻译中，应使用括号将专有名词用英文标出。