Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state. Interventional experiments indicate this representation can be used to control the output of the network and create "latent saliency maps" that can help explain predictions in human terms.
翻译:语言模型显示了令人惊讶的能力范围,但其明显能力的来源并不明确。 这些网络是否只是存储了一组表面统计数据,还是依赖于生成其所看到的序列的过程的内部描述? 我们通过应用GPT模型的变种来在简单的棋盘游戏中预测法律动作的任务来调查这一问题, Othello。 尽管这个网络对游戏或其规则没有先验的了解, 但我们发现了一个证据,显示董事会国家出现了一个新兴的非线性内部代表。 干预实验表明,这个代表可以用来控制网络的输出,并创建“ 远端突出的地图 ”, 帮助解释人类的预测。