Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g. hallucination in abstractive summarization or wrong format in automatic code generation). This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities. Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). Unfortunately, this approach is limited to unconditional distributions, represented by unconditional EBMs. In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on three different control objectives across two tasks: summarization with T5 and code generation with GPT-Neo. Our results show that fine-tuning using CDPG robustly moves these pretrained models closer towards meeting control objectives and -- in contrast with baseline approaches -- does not result in catastrophic forgetting.
翻译:机械学习正在转向一般用途的经过训练的基因化模型,这种模型以自我监督的方式对大量数据进行了培训,然后可用于解决大量任务。然而,由于这些模型的一般培训方法,这些模型往往不能满足下游的某些要求(例如抽象的抽象概括性幻觉或自动代码生成中错误的格式)。这提出了一个重要问题,即如何使经过训练的基因化模型适应新的任务而不破坏其能力。最近的工作建议通过基于能源的模式(EBMS)来代表具体任务的要求来解决这一问题,并使用分配政策梯度(DPG)来代表这些EBMs。不幸的是,这种方法只限于无条件的EBMs所代表的无条件分配。在本文件中,我们通过提出有条件的DPG(CDPG)来扩大这一方法,以有条件的任务为条件。我们评估CDPG的三项不同的控制目标:与T5和代码生成GPT-Neo。我们的结果显示,使用CDPG的精确调整使这些经过训练的模型更接近实现控制目标,而没有忘记基准结果。