Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional representations from hand-designed and context-independent self-supervised losses (multi-view) still adapts relatively slowly to the real world, which contains many non-IID subspaces requiring rapid distribution shift in both time and spatial attention patterns at varying levels of abstraction. In contrast, supervised language model cascades have shown the flexibility to adapt to many diverse manifolds, and hints of self-learning needed for autonomous task transfer. However, to date, transfer methods for language models like few-shot learning and fine-tuning still require human supervision and transfer learning using self-learning methods has been underexplored. We propose a self-supervised loss policy called contrastive distillation which manifests latent variables with high mutual information with both source and target tasks from weights to tokens. We show how this outperforms common methods of transfer learning and suggests a useful design axis of trading off compute for generalizability for online transfer. Contrastive distillation is improved through sampling from memory and suggests a simple algorithm for more efficiently sampling negative examples for contrastive losses than random sampling.
翻译:传统LL方法侧重于直接从偶发决定中学习决策政策,而缓慢和隐含地学习一般化所需的组成代表的语义,虽然采用了一些方法,通过辅助性自我监督损失,同时学习决策政策,通过辅助性自我监督损失来完善表述,同时学习亲手设计、背景独立自我监督损失(多视角)的构成代表仍然相对缓慢地适应现实世界,其中有许多非II-II的子空间,要求在不同程度的抽象中,在时间和空间关注模式上迅速进行分配转移。相比之下,受监督的语言模型级联显示了适应多种不同多元模式的灵活性,以及自主任务转移所需的自学提示。然而,迄今为止,对少数光学和微调等语言模式的转移方法,仍然需要对人进行自我学习方法的监督和转让。 我们建议了一种自我监督性损失政策,称为对比性蒸馏法,它显示潜在的变量,从来源到目标任务从重量到标志的高度相互信息。我们展示了如何适应多种不同的多元模式,以及自主任务转移所需的自学暗示自学的自学自学和微调整的自采样式学习的自我分析方法,建议一个有用的通过自采样式分析法式分析法式的简单的自我分析法式分析分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析模型分析的实用性模型分析模型分析方法,建议。