Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the first-order ALP framework to approximate policy iteration to address performance deficiencies of previous approaches? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on logistics problems from the ICAPS 2004 Probabilistic Planning Competition.
翻译:最近关于第一级Markov决定程序(FOMDPs)的近似线性编程(ALP)技术的工作代表了一组一阶基础功能的线性功能,并使用线性编程技术确定适当的加权数;这种方法的优点是,它不需要简化一阶值功能,可以独立于特定领域即时处理FOMDPs。在本文件中,我们讨论了提高这项工作适用性的若干问题:(1) 我们能否将第一级ALP框架扩展至大约政策迭代,以解决以前方法的绩效缺陷?(2) 我们能否自动产生基础功能并评估其对价值功能质量的影响?(3) 我们如何将具有普遍量化奖赏的棘手问题分解为可分解的子问题?我们提出这些问题的答案,同时提出一些新的优化,并就2004年ICAPS概率规划竞争的后勤问题提供比较经验评价。