用于方案合成和语义分解的内存增强政策优化 (Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing) - 专知论文

会员服务 ·

0

MAPO · Buffer（公司） · 优化器 · Weight · 语义分析 ·

2018 年 10 月 31 日

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

翻译：用于方案合成和语义分解的内存增强政策优化

Chen Liang,Mohammad Norouzi,Jonathan Berant,Quoc Le,Ni Lao

from arxiv, 16 Pages, 4 figures, 6 tables, accepted as a spotlight paper for NIPS 2018

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards. We evaluate MAPO on weakly supervised program synthesis from natural language (semantic parsing). On the WikiTableQuestions benchmark, we improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our source code is available at https://github.com/crazydonkey200/neural-symbolic-machines

翻译：我们提出记忆增强政策优化(MAPO),这是利用有希望的轨迹的记忆缓冲来减少政策梯度估计差异的一种简单和新颖的方法,用记忆增强政策优化(MAPO)来利用有希望的轨迹的记忆缓冲,以降低政策梯度估计的差异。MAPO适用于具有分立行动的确定性环境,例如结构化预测和组合优化任务。我们表示预期的回归目标为两个条件的加权和加权之和:对记忆缓冲内高回报轨迹的预期,以及对缓冲外轨迹的单独期望。为了对MAPO的高效算法,我们提议:(1) 加快和稳定培训;(2) 系统探索以发现高回报轨迹;(3) 从记忆缓冲内外分散取样,以扩大培训。MAPO提高政策梯度的抽样效率和稳健性,特别是在报酬微薄的任务上。我们评估MAPO关于自然语言受监管不力的节目合成(Smantical pary)。关于维基表质基准,我们改进了状态-艺术的算法,达到46.3%的精确度。

1

相关内容

MAPO

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Symbolic Priors for RNN-based Semantic Parsing

Symbolic Priors for RNN-based Semantic Parsing

Arxiv

3+阅读 · 2018年9月20日

Relational Deep Reinforcement Learning

Relational Deep Reinforcement Learning

Arxiv

10+阅读 · 2018年6月28日

Learning to Update for Object Tracking

Arxiv

8+阅读 · 2018年6月19日

Efficient semantic image segmentation with superpixel pooling

Arxiv

6+阅读 · 2018年6月7日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

Deep Semantic Hashing with Generative Adversarial Networks

Arxiv

5+阅读 · 2018年4月23日

Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

Arxiv

9+阅读 · 2018年4月22日

Pyramidal RoR for Image Classification

Arxiv

3+阅读 · 2017年10月1日

VIP会员

文章信息

相关主题

Buffer（公司）

相关VIP内容

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Symbolic Priors for RNN-based Semantic Parsing

Symbolic Priors for RNN-based Semantic Parsing

Arxiv

3+阅读 · 2018年9月20日

Relational Deep Reinforcement Learning

Relational Deep Reinforcement Learning

Arxiv

10+阅读 · 2018年6月28日

Learning to Update for Object Tracking

Arxiv

8+阅读 · 2018年6月19日

Efficient semantic image segmentation with superpixel pooling

Arxiv

6+阅读 · 2018年6月7日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

Deep Semantic Hashing with Generative Adversarial Networks

Arxiv

5+阅读 · 2018年4月23日

Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

Arxiv

9+阅读 · 2018年4月22日

Pyramidal RoR for Image Classification

Arxiv

3+阅读 · 2017年10月1日

微信扫码咨询专知VIP会员