We focus on reinforcement learning (RL) in relational problems that are naturally defined in terms of objects, their relations, and manipulations. These problems are characterized by variable state and action spaces, and finding a fixed-length representation, required by most existing RL methods, is difficult, if not impossible. We present a deep RL framework based on graph neural networks and auto-regressive policy decomposition that naturally works with these problems and is completely domain-independent. We demonstrate the framework in three very distinct domains. In goal-oriented BlockWorld, we demonstrate multi-parameter actions with pre-conditions. In SysAdmin, we show how to select multiple objects simultaneously. In all three domains, we report the method's competitive performance and impressive zero-shot generalization over different problem sizes. For example, in the classical planning domain of Sokoban, the method trained exclusively on 10x10 problems with three boxes solves 89% of 15x15 problems with five boxes.
翻译:在自然地以物体、其关系和操纵方式界定的关系问题方面,我们注重强化学习(RL),这些问题的特点是状态和动作空间各异,找到现有大多数RL方法所要求的固定长度的表达方式即使不是不可能,也是困难的。我们提出了一个基于图形神经网络和自动递进政策分解的深度RL框架,这个框架自然地与这些问题有关,并且完全独立于领域。我们在三个截然不同的领域展示了这个框架。在面向目标的BlackWorld中,我们用预设条件展示了多参数行动。在SysAdmin中,我们展示了如何同时选择多个对象。在所有三个领域,我们报告该方法的竞争性性能和对不同问题大小的令人印象深刻的零光谱化。例如,在Sokoban的经典规划领域,我们专门培训了10x10问题的方法,3个框用5个框解决了15x15问题的89%。