Recent work has shown the promise of creating generalist, transformer-based, policies for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies, by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in weight space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also propose that when merging policies, we can obtain better results if all policies start from common, pre-trained initializations, while also co-training on shared auxiliary tasks during problem-specific finetuning. In general, we believe research in this direction can help democratize and distribute the process of which forms generally capable agents.
翻译:最近的工作显示了创造通才、基于变压器、语言、愿景和顺序决策问题政策的前景。为了创建这样的模式,我们一般需要集中培训目标、数据和计算。如果我们能够通过将多重、具体任务、个别培训的政策结合起来,更灵活地制定通才政策,我们很感兴趣。在这项工作中,我们通过合并或平均地将受过关于不同MuJocolocolocomotion问题培训的重力空间决策变换者子集成,在没有集中培训的情况下形成多任务模式,朝着这一方向迈出了初步步骤。我们还提议,在合并政策时,如果所有政策都从共同的、预先培训的初始化开始,同时在针对具体问题的微调期间就共同的辅助任务进行共同培训,我们就能取得更好的结果。一般来说,我们相信,这方面的研究能够有助于民主化和分配一般有能力的代理人所组成的进程。</s>