Self-trained autonomous agents developed using machine learning are showing great promise in a variety of control settings, perhaps most remarkably in applications involving autonomous vehicles. The main challenge associated with self-learned agents in the form of deep neural networks, is their black-box nature: it is impossible for humans to interpret deep neural networks. Therefore, humans cannot directly interpret the actions of deep neural network based agents, or foresee their robustness in different scenarios. In this work, we demonstrate a method for probing which concepts self-learning agents internalise in the course of their training. For demonstration, we use a chess playing agent in a fast and light environment developed specifically to be suitable for research groups without access to enormous computational resources or machine learning models.
翻译:利用机器学习开发的自我训练自主代理机构在各种控制环境中表现出巨大的希望,也许最显著的是在涉及自主车辆的应用中。与深神经网络形式的自我学习代理机构有关的主要挑战在于其黑盒性质:人类不可能解释深神经网络。因此,人类不能直接解释深神经网络代理机构的行为,也不能预见其在不同情况下的稳健性。在这项工作中,我们展示了一种方法,用以检验哪些概念自学代理机构在培训过程中内在化。为了示范,我们使用一个在快速和轻便环境中专门开发的象棋代理机构,以适合研究团体,而没有巨大的计算资源或机器学习模型。