In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a set of plausible actions (in light of the demonstrations) for each input state, thus capturing the priors of the demonstrator and their multimodal behavior. By discretizing the action space, any discrete action deep RL technique can be readily applied to the continuous control problem. Experiments show that the proposed approach outperforms state-of-the-art methods such as SAC in the RL setup, and GAIL in the Imitation Learning setup. We provide a website with interactive videos: https://google-research.github.io/aquadem/ and make the code available: https://github.com/google-research/google-research/tree/master/aquadem.
翻译:在本文中,我们针对连续行动空间的问题提出了一个新的强化学习框架:从演示中量化行动(AQuaDem),拟议的方法包括学习从人类演示中分离出连续行动空间。这种分离为每个输入状态提供了一套可信的行动(根据演示),从而捕捉了演示人的前科及其多式行为。通过将行动空间分解,任何离散行动深度RL技术都可以随时适用于连续控制问题。实验显示,拟议的方法优于最先进的方法,如RL设置中的SAC和模拟学习设置中的GAIL。我们提供一个网站,提供互动视频:https://goagle-research.github.io/quadem/,并提供代码:https://github.com/gogle-resear/goolle-research/tree/master/quadem。