The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such constraints. Within our framework, agents learn effective behavior over a routine space: a new, higher-level action space, where each routine represents a set of 'equivalent' sequences of granular actions with arbitrary length. Our routine space is learned end-to-end to facilitate the accomplishment of underlying off-policy reinforcement learning objectives. We apply our framework to two state-of-the-art off-policy algorithms and show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode, improving computational efficiency.
翻译:强化学习的绩效取决于设计一个适当的行动空间,其中每项行动的效果都是可以测量的,然而,其效果足以允许灵活的行为。迄今为止,这一过程涉及非三重用户选择可用的行动及其执行频率。我们提出了一个新的强化学习框架,以有效解除这些限制。在我们的框架内,代理商学习常规空间的有效行为:一个新的、更高的行动空间,每个例行空间代表一系列“等效的”粒子动作,任意长度。我们的常规空间是学习端到端,以促进实现基本的非政策强化学习目标。我们将我们的框架应用到两种最先进的非政策性算法,并表明由此产生的代理商在每集要求较少与环境互动的同时获得了相关的绩效改进,提高了计算效率。