利用地方信贷和不完整轨迹更好地培训GFlowNets (Better Training of GFlowNets with Local Credit and Incomplete Trajectories) - 专知论文

会员服务 ·

0

泛函 · 能量函数 · Learning · 终止状态 · Better ·

2023 年 2 月 3 日

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

翻译：利用地方信贷和不完整轨迹更好地培训GFlowNets

Ling Pan,Nikolay Malkin,Dinghuai Zhang,Yoshua Bengio

Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). They are trained to generate an object $x$ through a sequence of steps with probability proportional to some reward function $R(x)$ (or $\exp(-\mathcal{E}(x))$ with $\mathcal{E}(x)$ denoting the energy function), given at the end of the generative trajectory. Like for other RL settings where the reward is only given at the end, the efficiency of training and credit assignment may suffer when those trajectories are longer. With previous GFlowNet work, no learning was possible from incomplete trajectories (lacking a terminal state and the computation of the associated reward). In this paper, we consider the case where the energy function can be applied not just to terminal states but also to intermediate states. This is for example achieved when the energy function is additive, with terms available along the trajectory. We show how to reparameterize the GFlowNet state flow function to take advantage of the partial reward already accrued at each state. This enables a training objective that can be applied to update parameters even with incomplete trajectories. Even when complete trajectories are available, being able to obtain more localized credit and gradients is found to speed up training convergence, as demonstrated across many simulations.

翻译：生成流程网络或 GFlowNet 与 Monte-Carlo Markov 链条方法( 由能量函数指定的分布样本)、强化学习( 学习一项政策, 通过一系列步骤来抽样组成物体)、增益模型( 学习一种政策, 通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤通过一系列步骤( 能源功能) 、增益模型( 学习一项政策, 以通过一系列步骤来抽样组成物体)、增益模型( 学习一种政策, 以通过一系列步骤来显示和抽样模型的分布) 以及摊销变异方法( 因为他们可以学习从本来比较棘手的场景中提取和样本 ) 。与其他RL 环境一样, 培训和信用分配的效率可能随着时间推移时间的变长而降低。随着GFloowNet 更新, 我们无法从不完整的轨迹中找到完整的轨迹( 沿着一个快速的轨迹运行, 和计算轨道的轨迹可以显示我们是如何在每一条纹状上运行, 。

0

相关内容

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

lncRNAH19介导NAT1基因甲基化改变对乳腺癌他莫昔芬耐药的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

肠浒苔（Enteromorpha intestinalis）多糖的结构和抗肿瘤活性随季节与地域变化规律研究

国家自然科学基金

0+阅读 · 2012年12月31日

适用于抗肿瘤药物载体的含介孔结构的多重响应复合微球

国家自然科学基金

0+阅读 · 2012年12月31日

无人直升机大机动飞行鲁棒控制

国家自然科学基金

0+阅读 · 2012年12月31日

石斑鱼肿瘤坏死因子受体相关因子6（TRAF6）在抗病毒天然免疫反应中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

非线性系统优先级多目标模型预测控制的稳定性与鲁棒性理论及应用

国家自然科学基金

0+阅读 · 2012年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

TRAIL在动脉粥样硬化发生发展中作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

抗滑桩土坡流变渗流特性及破坏机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Arxiv

1+阅读 · 2023年3月27日

Interpolation-Based Decoding of Folded Variants of Linearized and Skew Reed-Solomon Codes

Arxiv

0+阅读 · 2023年3月27日

Distributed Subweb Specifications for Traversing the Web

Arxiv

0+阅读 · 2023年3月27日

Statistical Limit Theorems in Distributionally Robust Optimization

Arxiv

0+阅读 · 2023年3月27日

Self-Consistent Learning: Cooperation between Generators and Discriminators

Arxiv

0+阅读 · 2023年3月26日

Optimizing Sepsis Care through Heuristics Methods in Process Mining: A Trajectory Analysis

Arxiv

0+阅读 · 2023年3月25日

PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments

Arxiv

0+阅读 · 2023年3月24日

Attention! Dynamic Epistemic Logic Models of (In)attentive Agents

Attention! Dynamic Epistemic Logic Models of (In)attentive Agents

Arxiv

0+阅读 · 2023年3月23日

Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes

Arxiv

0+阅读 · 2023年3月23日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

VIP会员

文章信息

相关主题

相关VIP内容

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Arxiv

1+阅读 · 2023年3月27日

Interpolation-Based Decoding of Folded Variants of Linearized and Skew Reed-Solomon Codes

Arxiv

0+阅读 · 2023年3月27日

Distributed Subweb Specifications for Traversing the Web

Arxiv

0+阅读 · 2023年3月27日

Statistical Limit Theorems in Distributionally Robust Optimization

Arxiv

0+阅读 · 2023年3月27日

Self-Consistent Learning: Cooperation between Generators and Discriminators

Arxiv

0+阅读 · 2023年3月26日

Optimizing Sepsis Care through Heuristics Methods in Process Mining: A Trajectory Analysis

Arxiv

0+阅读 · 2023年3月25日

PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments

Arxiv

0+阅读 · 2023年3月24日

Attention! Dynamic Epistemic Logic Models of (In)attentive Agents

Attention! Dynamic Epistemic Logic Models of (In)attentive Agents

Arxiv

0+阅读 · 2023年3月23日

Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes

Arxiv

0+阅读 · 2023年3月23日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

相关基金

miR-124靶向TRAF6在骨肉瘤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

lncRNAH19介导NAT1基因甲基化改变对乳腺癌他莫昔芬耐药的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

肠浒苔（Enteromorpha intestinalis）多糖的结构和抗肿瘤活性随季节与地域变化规律研究

国家自然科学基金

0+阅读 · 2012年12月31日

适用于抗肿瘤药物载体的含介孔结构的多重响应复合微球

国家自然科学基金

0+阅读 · 2012年12月31日

无人直升机大机动飞行鲁棒控制

国家自然科学基金

0+阅读 · 2012年12月31日

石斑鱼肿瘤坏死因子受体相关因子6（TRAF6）在抗病毒天然免疫反应中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

非线性系统优先级多目标模型预测控制的稳定性与鲁棒性理论及应用

国家自然科学基金

0+阅读 · 2012年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

TRAIL在动脉粥样硬化发生发展中作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

抗滑桩土坡流变渗流特性及破坏机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员