Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement learning, option models (Sutton, Precup \& Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we demonstrate empirically the potential impact of partial option models on the efficiency of planning.
翻译:人类和动物有能力在许多时间尺度上对不同的行动方针进行理性和预测。在强化学习中,选择模式(Sutton, Precup {Singh,1999;Precup,2000)为这种时间抽象的预测和推理提供了框架。自然智能剂也能够将其注意力集中在特定情况下相关或可行的行动方针上,有时被称为负担得起的行动。在本文件中,我们界定了选择的支付能力概念,并开发了时间上抽象的局部选择模式,其中考虑到一种选择只有在特定情况下才可能负担得起这一事实。我们分析了在使用这种模式时在规划和学习方面的估计和近似错误之间的取舍,并找出了一些有趣的特殊情况。此外,我们从经验上展示了部分选择模式对规划效率的潜在影响。