Consider a reinforcement learning problem where an agent has access to a very large amount of information about the environment, but it can only take very few actions to accomplish its task and to maximize its reward. Evidently, the main problem for the agent is to learn a map from a very high-dimensional space (which represents its environment) to a very low-dimensional space (which represents its actions). The high-to-low dimensional map implies that most of the information about the environment is irrelevant for the actions to be taken, and only a small fraction of information is relevant. In this paper we argue that the relevant information need not be learned by brute force (which is the standard approach), but can be identified from the intrinsic symmetries of the system. We analyze in details a reinforcement learning problem of autonomous driving, where the corresponding symmetry is the Galilean symmetry, and argue that the learning task can be accomplished with very few relevant parameters, or, more precisely, invariants. For a numerical demonstration, we show that the autonomous vehicles (which we call autonomous particles since they describe very primitive vehicles) need only four relevant invariants to learn how to drive very well without colliding with other particles. The simple model can be easily generalized to include different types of particles (e.g. for cars, for pedestrians, for buildings, for road signs, etc.) with different types of relevant invariants describing interactions between them. We also argue that there must exist a field theory description of the learning system where autonomous particles would be described by fermionic degrees of freedom and interactions mediated by the relevant invariants would be described by bosonic degrees of freedom.
翻译:高到 低的地图意味着大部分环境信息对于要采取的行动无关紧要, 并且只有一小部分信息是相关的。 在本文中, 我们争论说, 相关的信息不需要由粗力( 这是一种标准的方法) 来完成它的任务, 也可以从系统内在的对称中找出。 我们分析的是从一个高度空间( 代表它的环境) 学习一张地图到一个非常低的维度空间( 代表它的行动 ) 。 高到低的地图意味着, 有关环境的信息大多与要采取的行动无关, 并且只有一小部分信息是相关的。 在本文中, 我们争论说, 相关的信息不需要通过粗力( 也就是标准的媒介) 来学习, 而是从系统内在的对称来识别。 我们分析一个强化的自主性学习问题, 在那里, 对应的对 Galilean 的对称性进行对比, 并说, 学习任务可以用非常少的相关参数完成, 或者更精确的变异性。 对于数字演示来说, 我们指出, 自主的车辆( 我们称之为自主粒子, 因为它们描述非常原始的飞行器), 只需要用四种相关的领域 来描述, 和行进模型来解释。