Deep Reinforcement Learning (DRL) experiments are commonly performed in simulated environments due to the tremendous training sample demands from deep neural networks. In contrast, model-based Bayesian Learning allows a robot to learn good policies within a few trials in the real world. Although it takes fewer iterations, Bayesian methods pay a relatively higher computational cost per trial, and the advantage of such methods is strongly tied to dimensionality and noise. In here, we compare a Deep Bayesian Learning algorithm with a model-free DRL algorithm while analyzing our results collected from both simulations and real-world experiments. While considering Sim and Real learning, our experiments show that the sample-efficient Deep Bayesian RL performance is better than DRL even when computation time (as opposed to number of iterations) is taken in consideration. Additionally, the difference in computation time between Deep Bayesian RL performed in simulation and in experiments point to a viable path to traverse the reality gap. We also show that a mix between Sim and Real does not outperform a purely Real approach, pointing to the possibility that reality can provide the best prior knowledge to a Bayesian Learning. Roboticists design and build robots every day, and our results show that a higher learning efficiency in the real-world will shorten the time between design and deployment by skipping simulations.
翻译:深海强化学习(DRL) 实验通常在模拟环境中进行, 原因是深神经网络对大量培训抽样的需求。 相反, 建模的Bayesian Learning让机器人在现实世界的一些试验中学习好的政策。 尽管它花费的时间较少, 但Bayesian方法每试一次支付较高的计算成本, 而这种方法的优势与维度和噪音密切相关。 在这里, 我们比较深贝耶斯学习算法与无模型的DRL算法, 同时分析我们从模拟和现实世界实验中收集的结果。 在考虑Sim和真实学习的同时, 我们的实验表明, 样本效率高的Beep Bayesian RL性能比DRL性能好, 即使在考虑到计算时间( 相对于迭代数) 的情况下, 也比DRL要好。 此外, 在模拟和实验中进行的深海Bayesian RL的计算时间差, 也表明, Sim和Real的组合并不超越纯粹真实的方法, 指出现实的可能性是, 现实能提供最佳的先期知识, 来建立真正的机器人设计。