左重重反面以及政策和价值网络在以DNN为基础的索科班规划最佳第一搜索中的有效性 (Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning)

Despite the success of practical solvers in various NP-complete domains such as SAT and CSP as well as using deep reinforcement learning to tackle two-player games such as Go, certain classes of PSPACE-hard planning problems have remained out of reach. Even carefully designed domain-specialized solvers can fail quickly due to the exponential search space on hard instances. Recent works that combine traditional search methods, such as best-first search and Monte Carlo tree search, with Deep Neural Networks' (DNN) heuristics have shown promising progress and can solve a significant number of hard planning instances beyond specialized solvers. To better understand why these approaches work, we studied the interplay of the policy and value networks of DNN-based best-first search on Sokoban and show the surprising effectiveness of the policy network, further enhanced by the value network, as a guiding heuristic for the search. To further understand the phenomena, we studied the cost distribution of the search algorithms and found that Sokoban instances can have heavy-tailed runtime distributions, with tails both on the left and right-hand sides. In particular, for the first time, we show the existence of \textit{left heavy tails} and propose an abstract tree model that can empirically explain the appearance of these tails. The experiments show the critical role of the policy network as a powerful heuristic guiding the search, which can lead to left heavy tails with polynomial scaling by avoiding exploring exponentially sized subtrees. Our results also demonstrate the importance of random restarts, as are widely used in traditional combinatorial solvers, for DNN-based search methods to avoid left and right heavy tails.

翻译：尽管在诸如SAT和CSP等各种NP-完整领域的实用求解者取得了成功,并且利用深层强化学习解决Go等两个玩家游戏,但某些类PSPACE-硬性规划问题仍未解决。即使精心设计的域专用求解者也可能由于对硬体的指数搜索空间而很快失败。最近的一些工作结合了传统的搜索方法,例如最佳第一搜索和蒙特卡洛树搜索,而深神经网络(DNNN)的超自然现象显示出了可喜的进展,并且可以解决专业求解者以外的大量硬性规划事件。为了更好地了解这些方法为何起作用,我们研究了基于 DNNE 最佳搜索Sokopban 的某些政策和价值网络的相互作用,在Sokopban 上展示了政策网络的惊人效力,作为搜索的指引。为了进一步理解这些现象,我们研究了搜索算法的成本分布,发现索科班事件在左翼和右侧两侧都具有大幅的运行时间分布,其尾部都能够避免这些方法。我们广泛研究了这些方法,在Skobon-rial搜索过程中,我们还广泛地展示了在右侧的深度搜索过程中的深度研究结果。我们所展示了这些底底底的模型的底底底部的模型,我们用来展示了这些底部的底部的底部的底部,我们展示了这些底底底底底底底部的底部,我们展示了这些底研究。我们所展示的底研究。我们所展示了这些底的底的底的底的底的底的底的底的底的底的底的底的模型,我们所展示了这些底的底的底的底的底的底部,我们展示了我们所展示了这些底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底图,我们展示了这些底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底的底

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日