We study the action generalization ability of deep Q-learning in discrete action spaces. Generalization is crucial for efficient reinforcement learning (RL) because it allows agents to use knowledge learned from past experiences on new tasks. But while function approximation provides deep RL agents with a natural way to generalize over state inputs, the same generalization mechanism does not apply to discrete action outputs. And yet, surprisingly, our experiments indicate that Deep Q-Networks (DQN), which use exactly this type of function approximator, are still able to achieve modest action generalization. Our main contribution is twofold: first, we propose a method of evaluating action generalization using expert knowledge of action similarity, and empirically confirm that action generalization leads to faster learning; second, we characterize the action-generalization gap (the difference in learning performance between DQN and the expert) in different domains. We find that DQN can indeed generalize over actions in several simple domains, but that its ability to do so decreases as the action space grows larger.
翻译:我们研究了在离散行动空间进行深Q学习的行动一般化能力。 通用化对于高效强化学习( RL)至关重要, 因为它使代理方能够使用从以往在新任务方面的经验中获得的知识。 但是功能近似为深度RL代理商提供了一种自然的概括国家投入的方法, 同样的一般化机制并不适用于离散行动产出。 然而, 令人惊讶的是, 我们的实验表明, 完全使用这种功能相似的深Q网络( DQN) 仍然能够实现适度的行动一般化。 我们的主要贡献是双重的 : 首先, 我们提出一种方法, 利用行动相似性的专家知识来评估行动一般化, 并用经验证实, 行动一般化导致更快的学习; 第二, 我们描述不同领域的行动一般化差距( DQN 和专家在学习业绩上的差别) 。 我们发现, DQN 确实可以对几个简单领域的行动进行一般化, 但是随着行动空间的扩大,它这样做的能力会下降。