Creating a domain model, even for classical, domain-independent planning, is a notoriously hard knowledge-engineering task. A natural approach to solve this problem is to learn a domain model from observations. However, model learning approaches frequently do not provide safety guarantees: the learned model may assume actions are applicable when they are not, and may incorrectly capture actions' effects. This may result in generating plans that will fail when executed. In some domains such failures are not acceptable, due to the cost of failure or inability to replan online after failure. In such settings, all learning must be done offline, based on some observations collected, e.g., by some other agents or a human. Through this learning, the task is to generate a plan that is guaranteed to be successful. This is called the model-free planning problem. Prior work proposed an algorithm for solving the model-free planning problem in classical planning. However, they were limited to learning grounded domains, and thus they could not scale. We generalize this prior work and propose the first safe model-free planning algorithm for lifted domains. We prove the correctness of our approach, and provide a statistical analysis showing that the number of trajectories needed to solve future problems with high probability is linear in the potential size of the domain model. We also present experiments on twelve IPC domains showing that our approach is able to learn the real action model in all cases with at most two trajectories.
翻译:创建域模型, 即使是传统、 域独立规划, 也是一项臭名昭著的艰巨的知识工程任务。 解决这个问题的自然方法是从观察中学习一个域模型。 但是, 示范学习方法往往不能提供安全保障: 学习模式可能假设行动在不时适用, 并可能错误地捕捉行动的效果。 这可能导致计划在执行时失败。 在有些领域, 这种失败是无法接受的, 因为失败后无法或无法在网上重新规划的成本。 在这种环境中, 所有的学习都必须在网上进行, 并且基于所收集的一些观察, 例如, 由某些其他代理人或人类来进行。 通过这一学习, 示范学习方法的任务是产生一个保证成功的计划。 这就是所谓的无模式规划问题。 先前的工作提出了一种算法, 解决传统规划中没有模型的规划问题。 但是, 在某些领域, 这样的失败是无法被接受的, 是因为失败后无法再重新规划。 我们推广了以前的工作模式, 并提出了第一个安全、 不设模型的规划算法 。 我们证明我们的方法是正确的, 并且通过这一方法提供一种统计分析来保证成功的计划成功的计划成功的计划成功。 。 这叫做在高概率的模型中, 我们的模型中, 也展示了我们目前需要两个域域域域域的模型中, 需要多少的模型的模型的模型中, 的模型的模型中, 展示了我们可以展示了目前需要的模型的模型的模型的模型的模型的模型的模型的大小。