It has been recently shown that general policies for many classical planning domains can be expressed and learned in terms of a pool of features defined from the domain predicates using a description logic grammar. At the same time, most description logics correspond to a fragment of $k$-variable counting logic ($C_k$) for $k=2$, that has been shown to provide a tight characterization of the expressive power of graph neural networks. In this work, we make use of these results to understand the power and limits of using graph neural networks (GNNs) for learning optimal general policies over a number of tractable planning domains where such policies are known to exist. For this, we train a simple GNN in a supervised manner to approximate the optimal value function $V^{*}(s)$ of a number of sample states $s$. As predicted by the theory, it is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with $C_2$ features but not in those requiring more expressive $C_3$ features. In addition, it is observed that the features learned are in close correspondence with the features needed to express $V^{*}$ in closed form. The theory and the analysis of the domains let us understand the features that are actually learned as well as those that cannot be learned in this way, and let us move in a principled manner from a combinatorial optimization approach to learning general policies to a potentially, more robust and scalable approach based on deep learning.
翻译:最近已经表明,许多古典规划领域的一般政策可以使用描述逻辑语法来表达和学习。同时,大多数描述逻辑与美元=2美元的可变计算逻辑(C_k$)的碎片相对应,这显示对图形神经网络的表达力作了严密的描述。在这项工作中,我们利用这些结果来理解使用图形神经网络(GNNNs)来学习一些已知存在这类政策的可移动规划规划领域的最佳一般政策的权力和限度。为此,我们以监督的方式培训一个简单的GNNN,以近似一些样本的最佳价值功能(C_k$=2美元)。据理论预测,一般最佳政策是在一些领域获得的,在这些领域,一般价值功能可以用$_2美元来界定,但不需要以更清晰的深度的神经网络(GNNNN)网络(GNNS)为基点。此外,我们发现,在一些已知政策存在可移动性的规划领域,我们所学的特征是接近的,因此无法在学习领域中进行学习,在学习过程中以学习方式进行。