While reinforcement learning has achieved considerable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state representations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalization and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generating embedded representations. In this work, we propose a new approach for jointly embedding states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete and continuous domains. Specifically, we use a model of the environment to obtain embeddings for states and actions and present a generic architecture that uses these to learn a policy. In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces. Evaluations of our approach on several gaming and recommender system environments show it significantly outperforms state-of-the-art models in discrete domains with large state/action space, thus confirming the efficacy of joint embedding and its overall superior performance.
翻译:近几年来,强化学习取得了相当大的成功,但最先进的模式往往仍然受到州和行动空间规模的限制。无模式强化学习方法使用某种形式的州代表制,最近的工作探索了嵌入行动技术,目的是更好地概括和适用。然而,这些方法只考虑国家或行动,在产生嵌入代表制时忽视了它们之间的互动。在这项工作中,我们提出了一种将无模式和基于模型的强化学习的各方面结合起来的新办法。我们可以在离散和连续的领域应用。具体地说,我们使用一种环境模型为州和行动获得嵌入,并提出一种通用架构,利用这些模式学习政策。这样,通过我们的方法获得的嵌入式表述可以更好地在州或州进行概括,并通过捕捉嵌入空间的相似之处来更好地体现行动。我们对若干组合和建议系统环境的评估表明,在州/行动空间的离散域,我们的方法明显落后于最新模式,从而证实了联合嵌入及其总体高超性业绩的功效。