学习将综合规模方案作为可解释和可普遍应用的政策 (Learning to Synthesize Programs as Interpretable and Generalizable Policies) - 专知论文

会员服务 ·

0

学成 · Continuity · Performer · 泛化理论 · Neural Networks ·

2022 年 1 月 31 日

Learning to Synthesize Programs as Interpretable and Generalizable Policies

翻译：学习将综合规模方案作为可解释和可普遍应用的政策

Dweep Trivedi,Jesse Zhang,Shao-Hua Sun,Joseph J. Lim

from arxiv, NeurIPS 2021. 53 pages, 16 figures, 12 tables. Website at https://clvrai.github.io/leaps/

Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding.

翻译：最近,深入强化学习(DRL)方法在各个领域的任务上取得了令人印象深刻的成绩。然而,用DRL方法制作的神经网络政策并非人类解释的,而且往往难以概括为新的情景。为了解决这些问题,先前的工作探索学习较易解释和结构化的、较容易概括化的方案政策。然而,这些工作要么采用有限的政策代表(例如决策树、国家机器或预先界定的程序模板),要么需要更有力的监督(例如投入/产出州配对或专家演示)。我们提出了一个框架,而不是学习综合一个程序,该程序详细规定以灵活和明确的方式解决一项任务的程序,而只是利用奖励信号。为了减轻学习编成方案的困难,以便从头开始引导理想的代理行为。我们提议首先学习一个嵌入空间的方案,以不统一的方式持续地将不同的行为结合起来,然后搜索将空间嵌入的学习程序,以产生一个能够最大限度地实现某项任务的回报的方案。实验结果表明,拟议的框架不仅学会可靠地综合任务解决方案,而且还从奖励信号出发,同时解释各种学习方法。我们提出可以理解的学习模式。

0

相关内容

【MIT出版社新书】提升概率推理导论，455页pdf，An Introduction to Lifted Probabilistic Inference

【MIT出版社新书】提升概率推理导论，455页pdf，An Introduction to Lifted Probabilistic Inference

专知会员服务

38+阅读 · 2022年2月28日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

基于非监督学习的互适应脑机接口神经信息解析

国家自然科学基金

4+阅读 · 2014年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

信号稀疏表示与重构的神经网络算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Markov方法的大规模多阶段任务系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2013年12月31日

复杂数据下半参数双重回归模型的统计推断及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

网络环境下基于分布式事件触发采样机制的非线性多智能体系统一致性控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

多智能体系统的分布式采样一致性控制

国家自然科学基金

0+阅读 · 2012年12月31日

基于复杂网络的时间序列中社团结构的演化机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于面部视频的疲劳状态分析与理解

国家自然科学基金

0+阅读 · 2009年12月31日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Arxiv

0+阅读 · 2022年4月20日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Leveraging Language to Learn Program Abstractions and Search Heuristics

Arxiv

0+阅读 · 2022年4月18日

Zero-Shot Program Representation Learning

Arxiv

0+阅读 · 2022年4月18日

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Financial Time Series Representation Learning

Financial Time Series Representation Learning

Arxiv

10+阅读 · 2020年3月27日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【MIT出版社新书】提升概率推理导论，455页pdf，An Introduction to Lifted Probabilistic Inference

【MIT出版社新书】提升概率推理导论，455页pdf，An Introduction to Lifted Probabilistic Inference

专知会员服务

38+阅读 · 2022年2月28日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同时代的军事指挥控制演进

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

《战术突击工具包：军队的“边缘”操作系统》报告

相关资讯

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Arxiv

0+阅读 · 2022年4月20日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Leveraging Language to Learn Program Abstractions and Search Heuristics

Arxiv

0+阅读 · 2022年4月18日

Zero-Shot Program Representation Learning

Arxiv

0+阅读 · 2022年4月18日

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Financial Time Series Representation Learning

Financial Time Series Representation Learning

Arxiv

10+阅读 · 2020年3月27日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

相关基金

基于非监督学习的互适应脑机接口神经信息解析

国家自然科学基金

4+阅读 · 2014年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

信号稀疏表示与重构的神经网络算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Markov方法的大规模多阶段任务系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2013年12月31日

复杂数据下半参数双重回归模型的统计推断及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

网络环境下基于分布式事件触发采样机制的非线性多智能体系统一致性控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

多智能体系统的分布式采样一致性控制

国家自然科学基金

0+阅读 · 2012年12月31日

基于复杂网络的时间序列中社团结构的演化机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于面部视频的疲劳状态分析与理解

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员