Most approaches for goal recognition rely on specifications of the possible dynamics of the actor in the environment when pursuing a goal. These specifications suffer from two key issues. First, encoding these dynamics requires careful design by a domain expert, which is often not robust to noise at recognition time. Second, existing approaches often need costly real-time computations to reason about the likelihood of each potential goal. In this paper, we develop a framework that combines model-free reinforcement learning and goal recognition to alleviate the need for careful, manual domain design, and the need for costly online executions. This framework consists of two main stages: Offline learning of policies or utility functions for each potential goal, and online inference. We provide a first instance of this framework using tabular Q-learning for the learning stage, as well as three measures that can be used to perform the inference stage. The resulting instantiation achieves state-of-the-art performance against goal recognizers on standard evaluation domains and superior performance in noisy environments.
翻译:目标确认的大多数方法在追求一个目标时依赖于环境行为者可能动态的规格。这些规格有两个主要问题。首先,编码这些动态需要由域专家仔细设计,而域专家在识别时往往对噪音不强。第二,现有方法往往需要昂贵的实时计算,以说明每个潜在目标的可能性。在本文件中,我们开发了一个框架,将无模型强化学习和目标确认结合起来,以缓解谨慎、人工域设计的必要性和费用高昂的在线处决的必要性。这一框架包括两个主要阶段:每个潜在目标的政策或实用功能的离线学习和在线推断。我们提供了这一框架的第一个实例,即利用表格Q学习进行学习阶段,以及可用于进行推断阶段的三项措施。由此产生的即时化与标准评价领域的目标识别者相比,实现了最先进的业绩,在噪音环境中实现了优业绩。