PAC: 多机构强化学习中的辅助增值因素与反事实预测 (PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning) - 专知论文

会员服务 ·

0

PAC学习理论 · 分解的 · 泛函 · Learning · INFORMS ·

2022 年 9 月 26 日

PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

翻译：PAC: 多机构强化学习中的辅助增值因素与反事实预测

Hanhan Zhou,Tian Lan,Vaneet Aggarwal

from arxiv, accepted at NeurIPS 2022

Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods. It allows optimizing a joint action-value function through the maximization of factorized per-agent utilities due to monotonicity. In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints (across different states) on the representable function class, causing significant estimation error during training. We tackle this limitation and propose PAC, a new framework leveraging Assistive information generated from Counterfactual Predictions of optimal joint action selection, which enable explicit assistance to value function factorization through a novel counterfactual loss. A variational inference-based information encoding method is developed to collect and encode the counterfactual predictions from an estimated baseline. To enable decentralized execution, we also derive factorized per-agent policies inspired by a maximum-entropy MARL framework. We evaluate the proposed PAC on multi-agent predator-prey and a set of StarCraft II micromanagement tasks. Empirical results demonstrate improved results of PAC over state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms on all benchmarks.

翻译：多剂强化学习(MARL)在发展增值功能因素化方法方面取得了显著进展,通过因单一性而使每个试剂公用事业的因数化因子化功用最大化,优化了联合行动-价值功能。在本文件中,我们表明,在部分观察到的MAL问题中,代理商对自身行动的订单可能对可代表功能类别同时施加限制(在不同的国家),造成培训过程中的重大估计错误。我们处理这一局限性,并提议一个利用最佳联合行动选择的反事实预测产生的辅助信息的新框架,即利用最佳联合行动选择的反现实预测产生的辅助信息,使通过新的反事实损失明确协助价值化功用功用功用功用功用功用功用功用功用功用。正在开发一种基于变价的信息编码方法,从估计的基线中收集和编码反事实预测。为了便于分散执行,我们还从最大耐用功用MARL框架中推导出每个试用功用功用功用功用药政策。我们评价了拟议的多剂捕食食食者-预测和一套StarCraft II微观管理任务PAC和一套Starclactalalalal-bal-Agroupleglegy-strat-Supluplightsmmact-Suplationsmupat-Supat-Suplupat-Supat-smleglegismlegsmmmmmmmmmmmismismmmmlementsmmmmlement。

0

相关内容

PAC学习理论

PAC学习理论

PAC学习理论不关心假设选择算法，他关心的是能否从假设空间H中学习一个好的假设h。此理论不关心怎样在假设空间中寻找好的假设，只关心能不能找得到。现在我们在来看一下什么叫“好假设”？只要满足两个条件(PAC辨识条件)即可

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

探究RIP1激酶介导阿尔兹海默症中小胶质细胞炎症应答的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Mipu1促血管新生的机制研究：对VEGF-VASH1/SVBP负反馈通路的转录调节

国家自然科学基金

0+阅读 · 2014年12月31日

RIP2调控CD40-NF-кB信号通路在血管内皮细胞损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

过度活化的骨形态发生蛋白及其受体信号传导通路在后纵韧带骨化中的病理机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

硅通孔三维集成的高频电磁分析与优化设计

国家自然科学基金

0+阅读 · 2012年12月31日

Pim-3促进自噬对脓毒血症所致肾小管上皮细胞损伤的保护作用

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

SSeCKS通过HSPA12B影响NF-kappa B的活性在星形胶质细胞炎性激活中的意义

国家自然科学基金

0+阅读 · 2011年12月31日

半枝莲活性成分双向调节VEGF与DC机制新探索

国家自然科学基金

0+阅读 · 2008年12月31日

Informed Priors for Knowledge Integration in Trajectory Prediction

Arxiv

0+阅读 · 2022年11月1日

DanZero: Mastering GuanDan Game with Reinforcement Learning

Arxiv

0+阅读 · 2022年10月31日

Representation Learning for General-sum Low-rank Markov Games

Arxiv

0+阅读 · 2022年10月30日

A Multilevel Reinforcement Learning Framework for PDE-based Control

Arxiv

0+阅读 · 2022年10月28日

ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

Arxiv

1+阅读 · 2022年10月26日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

PAC学习理论

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Informed Priors for Knowledge Integration in Trajectory Prediction

Arxiv

0+阅读 · 2022年11月1日

DanZero: Mastering GuanDan Game with Reinforcement Learning

Arxiv

0+阅读 · 2022年10月31日

Representation Learning for General-sum Low-rank Markov Games

Arxiv

0+阅读 · 2022年10月30日

A Multilevel Reinforcement Learning Framework for PDE-based Control

Arxiv

0+阅读 · 2022年10月28日

ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

Arxiv

1+阅读 · 2022年10月26日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

探究RIP1激酶介导阿尔兹海默症中小胶质细胞炎症应答的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Mipu1促血管新生的机制研究：对VEGF-VASH1/SVBP负反馈通路的转录调节

国家自然科学基金

0+阅读 · 2014年12月31日

RIP2调控CD40-NF-кB信号通路在血管内皮细胞损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

过度活化的骨形态发生蛋白及其受体信号传导通路在后纵韧带骨化中的病理机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

硅通孔三维集成的高频电磁分析与优化设计

国家自然科学基金

0+阅读 · 2012年12月31日

Pim-3促进自噬对脓毒血症所致肾小管上皮细胞损伤的保护作用

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

SSeCKS通过HSPA12B影响NF-kappa B的活性在星形胶质细胞炎性激活中的意义

国家自然科学基金

0+阅读 · 2011年12月31日

半枝莲活性成分双向调节VEGF与DC机制新探索

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员