While reinforcement learning has achieved considerable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state representations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalization and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generating embedded representations. In this work, we establish the theoretical foundations for the validity of training a reinforcement learning agent using embedded states and actions. We then propose a new approach for jointly learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete and continuous domains. Specifically, we use a model of the environment to obtain embeddings for states and actions and present a generic architecture that leverages these to learn a policy. In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces. Evaluations of our approach on several gaming, robotic control, and recommender systems show it significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces, thus confirming its efficacy.
翻译:尽管近年来强化学习取得了相当大的成功,但最先进的模式往往仍然受到州和行动空间规模的限制; 示范强化学习方法采用某种形式的州代表制,最近的工作探索了嵌入行动技术,目的是更好地概括和适用; 然而,这些方法只考虑国家或行动,在产生嵌入代表制时忽视它们之间的互动; 在这项工作中,我们为利用嵌入的州和行动培训强化学习机构的有效性建立了理论基础; 然后,我们提出了一种新的方法,用于联合学习将无模式和基于模型的强化学习结合起来的国家和行动嵌入,这种学习方法可以同时适用于离散和连续的领域; 具体地说,我们使用环境模型来获得嵌入州和行动,并提供一个通用架构,利用这些模式来学习一项政策; 这样,通过我们的方法获得的嵌入式代表制能够通过在嵌入空间中获取相似之处,更好地对州和行动进行更宽泛的概括; 评估我们在若干配置、机器人控制和建议系统方面采用的办法,将一些无模式和基于模型的强化型态/状态系统,从而证明它明显超越了州/州际行动。