A*搜索无扩展：用深度Q网络学习启发式函数 (A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks)

from arxiv, Added theoretical results to show that Q* search is an admissible search algorithm. Added comparisons to deferred heuristic evaluation. Added experiments with Lights Out and the 35-Pancake puzzle

Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A* search grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this problem, we introduce Q* search, a search algorithm that uses deep Q-networks to guide search in order to take advantage of the fact that the sum of the transition costs and heuristic values of the children of a node can be computed with a single forward pass through a deep Q-network without explicitly generating those children. This significantly reduces computation time and requires only one node to be generated per iteration. We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search. Furthermore, Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search. Finally, although obtaining admissible heuristic functions from deep neural networks is an ongoing area of research, we prove that Q* search is guaranteed to find a shortest path given a heuristic function that neither overestimates the cost of a shortest path nor underestimates the transition cost.

翻译：解决具有大动作空间的问题需要高效地使用A*搜索，这对于人工智能社区几十年来一直是非常重要的。这是因为A*搜索的计算和内存要求与动作空间的大小成线性关系。当A*搜索使用计算密集型函数逼近器（例如深度神经网络）学习启发函数时，这个负担变得更为明显。为了解决这个问题，我们引入了Q*搜索，一种使用深度Q网络指导搜索的搜索算法，以利用一个节点的子节点的转移成本和启发值之和可以通过单次前向传递深度Q网络来计算，而无需明确生成这些子节点的事实。这显著减少了计算时间，并且每次迭代仅需要生成一个节点。我们使用Q*搜索解决魔方问题，该问题用大动作空间（包括1872个元动作）制定，并发现该动作空间增加了157倍，因此执行Q*搜索的计算时间仅增加了不到4倍，生成的节点数量增加了不到3倍。此外，与A*搜索相比，Q*搜索速度提高了129倍，生成的节点数减少了1288倍。最后，虽然从深度神经网络获得合理的启发式函数仍是一个正在研究的领域，但我们证明，如果存在一个既不高估最短路径的成本，也不低估转移成本的启发函数，则Q*搜索将保证找到一条最短路径。