维基百科能帮助离线强化学习吗? (Can Wikipedia Help Offline Reinforcement Learning?)

Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains.

翻译：微调强化学习(RL)模型之所以具有挑战性,是因为缺乏大规模现成的现成数据集以及不同环境之间在可转移性方面的差异很大。最近的工作是从采用变换器结构后,以更好的结果建模的顺序模型来看待脱线RL的处理,但当模型从零开始培训时,其趋同速度缓慢。在本文件中,我们期待利用这一强化学习的提法作为序列模型,并在对脱线的RL任务(控制、游戏)进行微调时,调查其他领域(视觉、语言)预先培训的序列模型的可转移性。为此,我们还提议了改进这些领域之间转让的技术。结果显示,在各种环境的趋同速度和奖励方面,在加速3-6x培训方面,以及在利用维基百科培训的和GPT2语言模型完成各种任务方面,在取得最新业绩方面,取得了一致的成绩。我们希望,这项工作不仅能够凸显利用通用的序列模型和预先训练的RL模型的潜力,而且还能够激发今后在不同的模型之间分享知识领域之间进行彻底的工作。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日