面向可控扩散模型的奖励引导探索 (Towards Controllable Diffusion Models via Reward-Guided Exploration) - 专知论文

会员服务 ·

0

扩散模型 · 可控 · 样本 · 分类器 · 梯度 ·

2023 年 4 月 14 日

Towards Controllable Diffusion Models via Reward-Guided Exploration

翻译：面向可控扩散模型的奖励引导探索

Hengtong Zhang,Tingyang Xu

By formulating data samples' formation as a Markov denoising process, diffusion models achieve state-of-the-art performances in a collection of tasks. Recently, many variants of diffusion models have been proposed to enable controlled sample generation. Most of these existing methods either formulate the controlling information as an input (i.e.,: conditional representation) for the noise approximator, or introduce a pre-trained classifier in the test-phase to guide the Langevin dynamic towards the conditional goal. However, the former line of methods only work when the controlling information can be formulated as conditional representations, while the latter requires the pre-trained guidance classifier to be differentiable. In this paper, we propose a novel framework named RGDM (Reward-Guided Diffusion Model) that guides the training-phase of diffusion models via reinforcement learning (RL). The proposed training framework bridges the objective of weighted log-likelihood and maximum entropy RL, which enables calculating policy gradients via samples from a pay-off distribution proportional to exponential scaled rewards, rather than from policies themselves. Such a framework alleviates the high gradient variances and enables diffusion models to explore for highly rewarded samples in the reverse process. Experiments on 3D shape and molecule generation tasks show significant improvements over existing conditional diffusion models.

翻译：将数据样本的生成描述为马尔可夫去噪过程，可以实现扩散模型在多项任务中的最优性能。最近，许多扩散模型的变体已被提出，以实现可控的样本生成。这些现有的大部分方法要么将控制信息表述为噪声逼近器的输入（即条件表示），要么在测试阶段引入预训练的分类器来引导 Langevin 动力学朝向条件目标。然而，前者在控制信息可以作为条件表示时才能工作，后者则需要预训练的引导分类器是可微分的。在本文中，我们提出了一种名为 RGDM（奖励引导扩散模型）的新框架，通过强化学习（RL）引导扩散模型的训练阶段。所提出的训练框架将加权对数似然和最大熵 RL 的目标联系起来，使得可以通过来自按指数尺度奖赏比例的支付分布的样本计算策略梯度，而不是通过策略本身。这种框架减轻了高梯度变化，并使扩散模型在反向过程中探索高度奖励的样本成为可能。在3D形状和分子生成任务上的实验证明，与现有的条件扩散模型相比，RGDM取得了显著的改进。

0

相关内容

扩散模型

扩散模型是近年来快速发展并得到广泛关注的生成模型。它通过一系列的加噪和去噪过程，在复杂的图像分布和高斯分布之间建立联系，使得模型最终能将随机采样的高斯噪声逐步去噪得到一张图像。

【NeurIPS 2022】扩散模型的深度平衡方法

【NeurIPS 2022】扩散模型的深度平衡方法

专知会员服务

40+阅读 · 2022年11月5日

【ACL2021】预训练语言模型的少样本知识图谱文本生成

专知会员服务

39+阅读 · 2021年6月6日

康奈尔大学「深度概率与生成模型」2021SP课程

专知会员服务

49+阅读 · 2021年4月24日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

专知会员服务

58+阅读 · 2020年5月21日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

斯坦福/谷歌大脑：两次蒸馏，引导扩散模型采样提速256倍！

斯坦福/谷歌大脑：两次蒸馏，引导扩散模型采样提速256倍！

新智元

2+阅读 · 2022年10月20日

Soft Diffusion：谷歌新框架从通用扩散过程中正确调度、学习和采样

Soft Diffusion：谷歌新框架从通用扩散过程中正确调度、学习和采样

机器之心

2+阅读 · 2022年10月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于石墨烯表面等离激元的亚辐射模式的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高效可见光催化剂的模拟和设计-金红石二氧化钛表面相的水溶液界面预测

国家自然科学基金

0+阅读 · 2015年12月31日

听觉注意强化惊反射前脉冲抑制的神经机制

国家自然科学基金

0+阅读 · 2014年12月31日

过渡金属氮化物纳米单晶的可控合成及催化应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于空间与界面二元限域效应可控合成多孔无机晶态材料

国家自然科学基金

0+阅读 · 2014年12月31日

基于iRGD靶向载药脂质体-微泡复合体的超声成像引导给药治疗肿瘤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于分段条件随机场的连续语音识别技术

国家自然科学基金

1+阅读 · 2011年12月31日

强制对流条件下Mo-Ni-B三元硼化物陶瓷增强相的原位合成与非平衡凝固

国家自然科学基金

0+阅读 · 2011年12月31日

辅助电脉冲低温扩散焊连接Ti(C,N)金属陶瓷与40Cr的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

铜基硫属半导体纳米材料的液相可控合成与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

The Hidden Language of Diffusion Models

Arxiv

0+阅读 · 2023年6月1日

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

Arxiv

0+阅读 · 2023年6月1日

Controllable Motion Diffusion Model

Arxiv

0+阅读 · 2023年6月1日

Efficient Diffusion Policies for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月31日

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Arxiv

0+阅读 · 2023年5月31日

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月30日

Are Diffusion Models Vulnerable to Membership Inference Attacks?

Arxiv

0+阅读 · 2023年5月30日

Aligning Optimization Trajectories with Diffusion Models for Constrained Design Generation

Arxiv

0+阅读 · 2023年5月29日

Diffusion Models in Vision: A Survey

Arxiv

29+阅读 · 2022年9月10日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Arxiv

67+阅读 · 2022年9月2日

VIP会员

文章信息

相关主题

相关VIP内容

【NeurIPS 2022】扩散模型的深度平衡方法

【NeurIPS 2022】扩散模型的深度平衡方法

专知会员服务

40+阅读 · 2022年11月5日

【ACL2021】预训练语言模型的少样本知识图谱文本生成

专知会员服务

39+阅读 · 2021年6月6日

康奈尔大学「深度概率与生成模型」2021SP课程

专知会员服务

49+阅读 · 2021年4月24日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

【KDD2020-清华大学】理解图表示学习中的负采样，Understanding Negative Sampling in Graph Representation Learning

专知会员服务

58+阅读 · 2020年5月21日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

斯坦福/谷歌大脑：两次蒸馏，引导扩散模型采样提速256倍！

斯坦福/谷歌大脑：两次蒸馏，引导扩散模型采样提速256倍！

新智元

2+阅读 · 2022年10月20日

Soft Diffusion：谷歌新框架从通用扩散过程中正确调度、学习和采样

Soft Diffusion：谷歌新框架从通用扩散过程中正确调度、学习和采样

机器之心

2+阅读 · 2022年10月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

The Hidden Language of Diffusion Models

Arxiv

0+阅读 · 2023年6月1日

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

Arxiv

0+阅读 · 2023年6月1日

Controllable Motion Diffusion Model

Arxiv

0+阅读 · 2023年6月1日

Efficient Diffusion Policies for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月31日

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Arxiv

0+阅读 · 2023年5月31日

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月30日

Are Diffusion Models Vulnerable to Membership Inference Attacks?

Arxiv

0+阅读 · 2023年5月30日

Aligning Optimization Trajectories with Diffusion Models for Constrained Design Generation

Arxiv

0+阅读 · 2023年5月29日

Diffusion Models in Vision: A Survey

Arxiv

29+阅读 · 2022年9月10日

Diffusion Models: A Comprehensive Survey of Methods and Applications

Arxiv

67+阅读 · 2022年9月2日

相关基金

基于石墨烯表面等离激元的亚辐射模式的研究

国家自然科学基金

0+阅读 · 2015年12月31日

高效可见光催化剂的模拟和设计-金红石二氧化钛表面相的水溶液界面预测

国家自然科学基金

0+阅读 · 2015年12月31日

听觉注意强化惊反射前脉冲抑制的神经机制

国家自然科学基金

0+阅读 · 2014年12月31日

过渡金属氮化物纳米单晶的可控合成及催化应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于空间与界面二元限域效应可控合成多孔无机晶态材料

国家自然科学基金

0+阅读 · 2014年12月31日

基于iRGD靶向载药脂质体-微泡复合体的超声成像引导给药治疗肿瘤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于分段条件随机场的连续语音识别技术

国家自然科学基金

1+阅读 · 2011年12月31日

强制对流条件下Mo-Ni-B三元硼化物陶瓷增强相的原位合成与非平衡凝固

国家自然科学基金

0+阅读 · 2011年12月31日

辅助电脉冲低温扩散焊连接Ti(C,N)金属陶瓷与40Cr的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

铜基硫属半导体纳米材料的液相可控合成与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员