Transformer language models have made tremendous strides in natural language understanding tasks. However, the complexity of natural language makes it challenging to ascertain how accurately these models are tracking the world state underlying the text. Motivated by this issue, we consider the task of language modeling for the game of chess. Unlike natural language, chess notations describe a simple, constrained, and deterministic domain. Moreover, we observe that the appropriate choice of chess notation allows for directly probing the world state, without requiring any additional probing-related machinery. We find that: (a) With enough training data, transformer language models can learn to track pieces and predict legal moves with high accuracy when trained solely on move sequences. (b) For small training sets providing access to board state information during training can yield significant improvements. (c) The success of transformer language models is dependent on access to the entire game history i.e. "full attention". Approximating this full attention results in a significant performance drop. We propose this testbed as a benchmark for future work on the development and analysis of transformer language models.
翻译:变换语言模式在自然语言理解任务方面取得了巨大进步。然而,自然语言的复杂性使得确定这些模式如何准确跟踪文本背后的世界状态具有挑战性。受这一问题的驱动,我们考虑象棋游戏语言模型的任务。与自然语言不同,象棋符号描述简单、受限和决定性的领域。此外,我们发现,对象棋标记的适当选择允许直接探测世界状态,而不需要任何其他与测试有关的机器。我们发现:(a) 有足够的培训数据,变换语言模型可以学习跟踪碎片,在仅接受移动序列培训时可以非常准确地预测法律动作。 (b) 对于在培训期间提供登机国家信息的小型培训组合可以带来显著的改进。 (c)变换语言模型的成功取决于能否进入整个游戏历史,即“充分注意”。我们提出这一测试床作为今后开发和分析变换语言模型的基准。