There is a widespread intuition that model-based control methods should be able to surpass the data efficiency of model-free approaches. In this paper we attempt to evaluate this intuition on various challenging locomotion tasks. We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning; the learned policy serves as a proposal for MPC. We find that well-tuned model-free agents are strong baselines even for high DoF control problems but MPC with learned proposals and models (trained on the fly or transferred from related tasks) can significantly improve performance and data efficiency in hard multi-task/multi-goal settings. Finally, we show that it is possible to distil a model-based planner into a policy that amortizes the planning computation without any loss of performance. Videos of agents performing different tasks can be seen at https://sites.google.com/view/mbrl-amortization/home.
翻译:人们广泛认为,基于模型的控制方法应该能够超过无模型方法的数据效率。在本文件中,我们试图评估关于各种具有挑战性的移动任务的各种直觉。我们采取了混合方法,将模型预测控制(MPC)与学习的模型和不学习模型的政策学习结合起来;所学的政策作为MPC的一项提案。我们发现,即使对于高剂量控制问题,没有模型的妥善调控剂也是强有力的基线,但是,具有(在飞行上受过训练或从相关任务中转移的)丰富建议和模型的MPC可以大大提高硬性多任务/多目标环境中的性能和数据效率。最后,我们表明,有可能将基于模型的规划器分解成一项政策,在不丧失任何性能的情况下对规划计算进行合并。在https://sites.google.com/view/mbrl-amortization/home上可以看到从事不同任务的代理人的录像。