Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.
翻译:积极强化学习(ARL)是强化学习的一个变体,在强化学习中,代理不观察奖励,除非它选择支付查询费用c > 0。 ARL的中心问题是如何量化奖励信息的长期价值。 即使在多武装匪徒中,计算这种信息的价值也是棘手的,我们不得不依赖疲劳主义。 我们提出和评估多种武装匪徒和马可夫(Tabular)决策程序中对ARL的几种累累性方法,讨论和说明ARL问题的一些具有挑战性的方面。