Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists. We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets to test the algorithms in settings closer to real world problems, including complex noise and partially observable states. We find that on the IB, hybrid approaches face severe difficulties and that simpler algorithms, such as rollout based algorithms or model-free algorithms with simpler regularizers perform best on the datasets.
翻译:离线强化学习(RL) 算法通常设计环境,如MuJoCo认为规划视野极长,没有噪音。我们对各种工业基准(IB)数据集的无模型、基于模型和混合离线RL方法进行比较,以测试更接近现实世界问题的环境下的算法,包括复杂的噪音和部分可观测状态。我们发现在IB上,混合方法面临严重困难,更简单的算法,如推出基于算法的算法或无模型算法,加上较简单的正规化者,在数据集上表现最佳。