A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generation (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines. Code: https://github.com/ryangpeixu/pg3
翻译:在这项工作中,我们研究通用政策搜索方法,重点是用于指导政策搜索的评分功能;我们展示了两个得分功能的局限性,并提出了克服这些局限性的新办法;我们的方法背后的主要理念是,政策指南规划促进普遍化的政策制定(PG3),即应使用候选政策指导培训问题规划,作为评估该候选人的一种机制;简化设置的理论结果为PG3提供了最佳或可接受的条件;然后,我们研究在规划问题以PDDL为基础和取消政策决定清单的情况下,具体地即时政策搜索;六个领域的实证结果证实,PG3学习的通用政策比几个基线更有效率和效力。 守则:https://github.com/ryangpeixu/pg3。