What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability. In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network. We also provide a behavioural analysis focusing on opening play, including qualitative analysis from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary investigation looking at the low-level details of AlphaZero's representations, and make the resulting behavioural and representational analyses available online.
翻译:诸如AlphaZero等先进的神经网络代理人学到了什么?这个问题既具有科学意义,也具有实际意义。如果强大的神经网络的表述与人类概念不相像,那么我们理解其决定的忠实解释的能力就会受到限制,最终会限制我们通过神经网络解释所能实现的目标。在这项工作中,我们提供了证据,证明人类知识是阿尔法Zero神经网络在训练象棋游戏时获得的。通过探索人类象棋概念,我们展示了阿尔法Zero网络中这些概念的何时和何地。我们还提供了以开场游戏为重点的行为分析,包括国际象棋大师Vladimir Kramnik的质量分析。最后,我们进行了初步调查,调查阿尔法Zero的低层次陈述细节,并将由此产生的行为和陈述分析在线提供。