We propose a multi-time-scale predictive representation learning method to efficiently learn robust driving policies in an offline manner that generalize well to novel road geometries, and damaged and distracting lane conditions which are not covered in the offline training data. We show that our proposed representation learning method can be applied easily in an offline (batch) reinforcement learning setting demonstrating the ability to generalize well and efficiently under novel conditions compared to standard batch RL methods. Our proposed method utilizes training data collected entirely offline in the real-world which removes the need of intensive online explorations that impede applying deep reinforcement learning on real-world robot training. Various experiments were conducted in both simulator and real-world scenarios for the purpose of evaluation and analysis of our proposed claims.
翻译:我们建议采用多时规模的预测代表性学习方法,以离线方式有效学习稳健的驾驶政策,该方法应概括地反映新的道路地形,以及离线培训数据中未涵盖的损坏和转移注意力的车道条件。我们表明,我们拟议的代表性学习方法可以在离线(批量)强化学习环境中很容易应用,表明在与标准分批RL方法相比的新条件下能够高超和高效率地普及。我们的拟议方法利用了在现实世界中完全从离线收集的培训数据,从而消除了在现实世界机器人培训中应用深度强化学习而阻碍深入在线探索的需要。我们为评估和分析拟议索赔而在模拟和现实世界中进行了各种实验。