Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the state-of-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis. Code is available at https://github.com/vincentzhang/decentCEM
翻译:在模型强化学习(MBRL)中,通常使用集中方法更新抽样分布,仅以最高-美元业务的抽样结果为基础。在本文中,我们表明,这种集中方法使CEM易受当地选取的伤害,从而损害其抽样效率。为解决这一问题,我们建议分散使用CEM(DecentCEM),这是对传统的CEM(CEM)的一种简单而有效的改进,方法是使用一个相互独立运行的CEM实例组合,每个案例都对其本身的抽样分布进行局部改进。我们提供理论和经验分析,以证明这种简单的分散方法的有效性。我们从经验上表明,与传统的集中方法相比,我们使用单一或甚至混合的高山分配方法,使我们的RimCEM(C)发现全球最佳得多,这样可以提高取样效率。此外,我们把我们的RimCEMEM(RBR) 用于MBRL/C(MBRC) 的深度规划问题中,并评估我们在若干连续的控制环境中的做法,与基于CMEMC(C) 的状态的改进成本,同时用Slimal EM(MEM) IML) 的模型的升级分析显示我们的现有成本标准,只有标准的升级分析。