Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary. Our experiments show that the learned patterns are human-interpretable, and that lower layers tend to capture shallow patterns, while upper layers learn more semantic ones. The values complement the keys' input patterns by inducing output distributions that concentrate probability mass on tokens likely to appear immediately after each pattern, particularly in the upper layers. Finally, we demonstrate that the output of a feed-forward layer is a composition of its memories, which is subsequently refined throughout the model's layers via residual connections to produce the final output distribution.
翻译:Feed-forward 层构成变压器模型参数的三分之二,但它们在网络中的作用仍未得到充分探索。 我们显示,以变压器为基础的语言模型的 feed-forward 层作为关键值记忆发挥作用,其中每个关键值与培训示例中的文本模式相关,每个值都诱发输出词汇的分布。 我们的实验显示,所学模式是人类可解释的,低层倾向于捕捉浅层模式,而上层则学习更多的语义模式。 这些数值通过吸引输出分布来补充键的输入模式,这种输出分布将概率集中在每个模式之后可能立即出现的符号上层。 最后,我们证明, 进料前层的输出是其记忆的构成, 随后通过剩余连接在整个模型层中进行改进,以产生最终输出分布。