Model-based reinforcement learning is a widely accepted solution for solving excessive sample demands. However, the predictions of the dynamics models are often not accurate enough, and the resulting bias may incur catastrophic decisions due to insufficient robustness. Therefore, it is highly desired to investigate how to improve the robustness of model-based RL algorithms while maintaining high sampling efficiency. In this paper, we propose Model-Based Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness. By combining them in a complementary way, MBDP provides a flexible control mechanism to meet different demands of robustness and efficiency by tuning two corresponding dropout ratios. The effectiveness of MBDP is demonstrated both theoretically and experimentally.
翻译:以模型为基础的强化学习是解决过多抽样需求的一个普遍接受的解决办法,然而,对动态模型的预测往往不够准确,由此产生的偏差可能因为不够稳健而引发灾难性的决定,因此,非常希望调查如何提高基于模型的RL算法的稳健性,同时保持高的取样效率;在本文件中,我们提议采用基于模型的双滴计划,以平衡稳健和效率;甲基溴淘汰方案由两类辍学机制组成,其中推出的目的是提高稳健性和低廉的取样效率成本,而模型退出则旨在以稍微牺牲稳健的代价补偿丧失的效率;通过以互补的方式,将模型退出方案提供一个灵活的控制机制,通过调整两种相应的辍学率,满足不同的稳健和效率需求;从理论上和实验上证明甲基溴淘汰方案的有效性。