以搜索为基础的多方机构学习的政策价值协调和强力 (Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning) - 专知论文

会员服务 ·

0

AlphaZero · 可约的 · 稳健性 · Learning · 可辨认的 ·

2023 年 2 月 6 日

Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

翻译：以搜索为基础的多方机构学习的政策价值协调和强力

Niko A. Grupen,Michael Hanlon,Alexis Hao,Daniel D. Lee,Bart Selman

from arxiv, 9 pages, 5 figures

Large-scale AI systems that combine search and learning have reached super-human levels of performance in game-playing, but have also been shown to fail in surprising ways. The brittleness of such models limits their efficacy and trustworthiness in real-world deployments. In this work, we systematically study one such algorithm, AlphaZero, and identify two phenomena related to the nature of exploration. First, we find evidence of policy-value misalignment -- for many states, AlphaZero's policy and value predictions contradict each other, revealing a tension between accurate move-selection and value estimation in AlphaZero's objective. Further, we find inconsistency within AlphaZero's value function, which causes it to generalize poorly, despite its policy playing an optimal strategy. From these insights we derive VISA-VIS: a novel method that improves policy-value alignment and value robustness in AlphaZero. Experimentally, we show that our method reduces policy-value misalignment by up to 76%, reduces value generalization error by up to 50%, and reduces average value error by up to 55%.

翻译：将搜索和学习结合起来的大规模AI系统在游戏游戏中达到超人性水平,但也以令人惊讶的方式显示其失败。这些模型的微弱限制了它们在现实世界部署中的功效和可信度。在这项工作中,我们系统地研究一种算法,即阿尔法泽罗,并找出与勘探性质有关的两种现象。首先,我们发现政策价值不匹配的证据 -- -- 对于许多国家来说,阿尔法泽罗的政策和价值预测相互矛盾,在阿尔法泽罗的目标中揭示出准确的移动选择和价值估计之间的矛盾。此外,我们发现阿尔法泽罗的价值功能存在不一致之处,尽管其政策正在发挥最佳战略作用,但导致其普遍化不力。我们从这些洞察中得出VISA-VIS:一种新颖的方法,在阿尔法泽罗改进政策价值的对齐和价值的稳健性。实验性,我们显示我们的方法将政策价值不匹配减少高达76%,将价值一般误差降低到50%,并将平均误差降低到55%。

0

相关内容

AlphaZero

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

血管稳态与重构的调控机制

国家自然科学基金

1+阅读 · 2017年12月31日

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

TRIM33在表观遗传水平上对TGF-β信号通路的调控

国家自然科学基金

0+阅读 · 2014年12月31日

逆境胁迫对樟叶越桔CA的影响及调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

Hippo信号通路调控间充质干细胞向ARDS肺泡上皮细胞分化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

纳米纤维表面蛋白质多维纳米结构的构筑及形成机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA 和cis-nat-siRNA介导大麦细胞壁和β－D－葡聚糖合成调控的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

铈基氧化物纳米材料对氯代芳烃和氮氧化物的协同降解研究

国家自然科学基金

0+阅读 · 2011年12月31日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2023年3月26日

Exploring Novel Quality Diversity Methods For Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月26日

Sequential Knockoffs for Variable Selection in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月24日

PyPose: A Library for Robot Learning with Physics-based Optimization

Arxiv

0+阅读 · 2023年3月24日

Learning Reward Machines in Cooperative Multi-Agent Tasks

Arxiv

0+阅读 · 2023年3月24日

On Designing a Learning Robot: Improving Morphology for Enhanced Task Performance and Learning

Arxiv

0+阅读 · 2023年3月23日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2023年3月26日

Exploring Novel Quality Diversity Methods For Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月26日

Sequential Knockoffs for Variable Selection in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月24日

PyPose: A Library for Robot Learning with Physics-based Optimization

Arxiv

0+阅读 · 2023年3月24日

Learning Reward Machines in Cooperative Multi-Agent Tasks

Arxiv

0+阅读 · 2023年3月24日

On Designing a Learning Robot: Improving Morphology for Enhanced Task Performance and Learning

Arxiv

0+阅读 · 2023年3月23日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

血管稳态与重构的调控机制

国家自然科学基金

1+阅读 · 2017年12月31日

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

TRIM33在表观遗传水平上对TGF-β信号通路的调控

国家自然科学基金

0+阅读 · 2014年12月31日

逆境胁迫对樟叶越桔CA的影响及调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

Hippo信号通路调控间充质干细胞向ARDS肺泡上皮细胞分化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

纳米纤维表面蛋白质多维纳米结构的构筑及形成机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA 和cis-nat-siRNA介导大麦细胞壁和β－D－葡聚糖合成调控的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

铈基氧化物纳米材料对氯代芳烃和氮氧化物的协同降解研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员