控制配有分配政策梯级的有条件语言模式 (Controlling Conditional Language Models with Distributional Policy Gradients) - 专知论文

会员服务 ·

0

条件语言模型 · MoDELS · 语言模型化 · 控制器 · T5 ·

2021 年 12 月 1 日

Controlling Conditional Language Models with Distributional Policy Gradients

翻译：控制配有分配政策梯级的有条件语言模式

Tomasz Korbak,Hady Elsahar,German Kruszewski,Marc Dymetman

from arxiv, CtrlGen: Controllable Generative Modeling in Language and Vision Workshop at NeurIPS 2021

Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g. hallucination in abstractive summarization or wrong format in automatic code generation). This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities. Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). Unfortunately, this approach is limited to unconditional distributions, represented by unconditional EBMs. In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on three different control objectives across two tasks: summarization with T5 and code generation with GPT-Neo. Our results show that fine-tuning using CDPG robustly moves these pretrained models closer towards meeting control objectives and -- in contrast with baseline approaches -- does not result in catastrophic forgetting.

翻译：机械学习正在转向一般用途的经过训练的基因化模型,这种模型以自我监督的方式对大量数据进行了培训,然后可用于解决大量任务。然而,由于这些模型的一般培训方法,这些模型往往不能满足下游的某些要求(例如抽象的抽象概括性幻觉或自动代码生成中错误的格式)。这提出了一个重要问题,即如何使经过训练的基因化模型适应新的任务而不破坏其能力。最近的工作建议通过基于能源的模式(EBMS)来代表具体任务的要求来解决这一问题,并使用分配政策梯度(DPG)来代表这些EBMs。不幸的是,这种方法只限于无条件的EBMs所代表的无条件分配。在本文件中,我们通过提出有条件的DPG(CDPG)来扩大这一方法,以有条件的任务为条件。我们评估CDPG的三项不同的控制目标:与T5和代码生成GPT-Neo。我们的结果显示,使用CDPG的精确调整使这些经过训练的模型更接近实现控制目标,而没有忘记基准结果。

0

相关内容

条件语言模型

条件语言模型

基于预训练语言模型的文本生成

基于预训练语言模型的文本生成

专知会员服务

29+阅读 · 2022年1月28日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Self-conditioning pre-trained language models

Arxiv

0+阅读 · 2022年2月4日

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Arxiv

3+阅读 · 2021年12月13日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

Guess who? Multilingual approach for the automated generation of author-stylized poetry

Guess who? Multilingual approach for the automated generation of author-stylized poetry

Arxiv

4+阅读 · 2018年9月17日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

条件语言模型

语言模型化

相关VIP内容

基于预训练语言模型的文本生成

基于预训练语言模型的文本生成

专知会员服务

29+阅读 · 2022年1月28日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Self-conditioning pre-trained language models

Arxiv

0+阅读 · 2022年2月4日

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Arxiv

3+阅读 · 2021年12月13日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

Guess who? Multilingual approach for the automated generation of author-stylized poetry

Guess who? Multilingual approach for the automated generation of author-stylized poetry

Arxiv

4+阅读 · 2018年9月17日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员