Powered by deep representation learning, reinforcement learning (RL) provides an end-to-end learning framework capable of solving self-driving (SD) tasks without manual designs. However, time-varying nonstationary environments cause proficient but specialized RL policies to fail at execution time. For example, an RL-based SD policy trained under sunny days does not generalize well to rainy weather. Even though meta learning enables the RL agent to adapt to new tasks/environments, its offline operation fails to equip the agent with online adaptation ability when facing nonstationary environments. This work proposes an online meta reinforcement learning algorithm based on the \emph{conjectural online lookahead adaptation} (COLA). COLA determines the online adaptation at every step by maximizing the agent's conjecture of the future performance in a lookahead horizon. Experimental results demonstrate that under dynamically changing weather and lighting conditions, the COLA-based self-adaptive driving outperforms the baseline policies in terms of online adaptability. A demo video, source code, and appendixes are available at {\tt https://github.com/Panshark/COLA}
翻译:强化学习(RL)以深层代表性学习为动力,强化学习(RL)提供了一个端到端学习框架,能够在没有手工设计的情况下解决自我驾驶(SD)任务。然而,时间变化的非静止环境导致执行时精巧但专业化的RL政策失败。例如,在阳光明媚的日子里培训的基于RL的SD政策没有概括到雨季的天气。即使元学习使RL代理能够适应新的任务/环境,但其离线操作未能使代理在面对非静止环境时具备在线适应能力。这项工作提出了基于\emph{conjectur在线外观适应}(COLA)的在线元强化学习算法(COLA)。COLA决定了每个步骤的在线适应,最大限度地利用代理商对未来在外观前景中的性能的预测。实验结果表明,在动态变化的天气和照明条件下,基于COLA的自我适应驱动器在面临非静止环境时,无法使该代理商具备在线适应性基准政策。在 &-Atas/Agus/Givth.</s>