We empirically investigate how pre-training on data of different modalities, such as language and vision, affects fine-tuning of Transformer-based models to Mujoco offline reinforcement learning tasks. Analysis of the internal representation reveals that the pre-trained Transformers acquire largely different representations before and after pre-training, but acquire less information of data in fine-tuning than the randomly initialized one. A closer look at the parameter changes of the pre-trained Transformers reveals that their parameters do not change that much and that the bad performance of the model pre-trained with image data could partially come from large gradients and gradient clipping. To study what information the Transformer pre-trained with language data utilizes, we fine-tune this model with no context provided, finding that the model learns efficiently even without context information. Subsequent follow-up analysis supports the hypothesis that pre-training with language data is likely to make the Transformer get context-like information and utilize it to solve the downstream task.
翻译:我们从经验上调查了对不同模式(如语言和愿景)的数据进行预先培训如何影响对基于变异器的模型进行微调,使之适应穆乔科脱线强化学习任务。对内部代表的分析显示,经过培训的变异器在培训前和培训后获得的表述方式大不相同,但在微调方面获得的数据信息比随机初始化的数据少。仔细研究培训前变异器的参数变化表明,其参数变化不大,经过培训的模型在图像数据方面的不良性能可能部分来自大梯度和梯度剪切。为了研究对语言数据使用进行预先培训的变异器的信息,我们用没有提供的背景来微调这一模型,发现即使没有提供背景信息,该模型也能有效地学习数据。随后的后续分析支持这样的假设,即语言数据培训前有可能使变异器获得与背景相似的信息并利用它解决下游任务。