Optimization of heuristic functions for the A* algorithm, realized by deep neural networks, is usually done by minimizing square root loss of estimate of the cost to goal values. This paper argues that this does not necessarily lead to a faster search of A* algorithm since its execution relies on relative values instead of absolute ones. As a mitigation, we propose a L* loss, which upper-bounds the number of excessively expanded states inside the A* search. The L* loss, when used in the optimization of state-of-the-art deep neural networks for automated planning in maze domains like Sokoban and maze with teleports, significantly improves the fraction of solved problems, the quality of founded plans, and reduces the number of expanded states to approximately 50%
翻译:由深神经网络实现的A* 算法的超值功能优化,通常通过最大限度地减少对目标值成本估算成本的平根损失,实现A* 算法的最优化。本文认为,这并不一定导致对A* 算法的快速搜索,因为执行A* 算法依赖于相对值而不是绝对值。作为缓解措施,我们提议了L* 损失,将A* 搜索中过度扩展的状态数量上限。L* 损失,用于优化Sokoban等迷宫区和远程传送机迷宫等迷宫区进行自动规划的最先进的深神经网络,大大提高了已解决问题的分数、已建立计划的质量,并将扩大的州数量减少到约50%。