Learning a well-informed heuristic function for hard task planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model to learn a heuristic capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism, which drastically improves the learning of a good heuristic function. To counter the limitation of the method in the creation of problems of increasing difficulty, we demonstrate the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and far exceeds the performances of all existing baselines including classical planning heuristics. We demonstrate its effectiveness for grid-type PDDL domains.
翻译:虽然已知神经网络结构可以代表这种超自然学知识,但不清楚的是,我们学到了哪些具体信息,以及旨在了解结构的技术是否有助于提高超自然学的质量。本文提供了一个网络模型,以学习一种能通过利用关注机制的最佳模拟计划将国家空间的遥远部分连接起来的超自然学,这种机制大大改进了对良好超自然学功能的学习。为了克服造成日益困难问题的方法的局限性,我们展示了课程学习的利用,在培训中增加了新解决的问题,这反过来又有助于解决更加复杂的问题,远远超出了包括传统规划超自然学在内的所有现有基线的性能。我们展示了电网型PDDL域的有效性。