预算制约下多剂强化学习质量价值共享框架 (A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint) - 专知论文

会员服务 ·

0

学成 · 约束 · 强化学习 · Performer · 多代理人模型 ·

2020 年 11 月 29 日

A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint

翻译：预算制约下多剂强化学习质量价值共享框架

Changxi Zhu,Ho-fung Leung,Shuyue Hu,Yi Cai

from arxiv, 31 pages, 16 figures, submitted to ACM Transactions on Autonomous and Adaptive Systems (TAAS)

In teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multiagent reinforcement learning (MARL), where agents need to cooperate with one another, a student may fail to cooperate well with others even by following the teachers' suggested actions, as the polices of all agents are ever changing before convergence. When the number of times that agents communicate with one another is limited (i.e., there is budget constraint), the advising strategy that uses actions as advices may not be good enough. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraint. In PSAF, each Q-learner can decide when to ask for Q-values and share its Q-values. We perform experiments in three typical multiagent learning problems. Evaluation results show that our approach PSAF outperforms existing advising methods under both unlimited and limited budget, and we give an analysis of the impact of advising actions and sharing Q-values on agents' learning.

翻译：在教师-学生框架内,经验更丰富的代理(教师)通过建议在某些州采取行动,帮助加速另一个代理(学生)的学习。在合作性多试剂强化学习(MARL)中,如果代理人需要相互合作,那么学生可能甚至没有与其他人进行良好的合作,即使按照教师建议的行动,因为所有代理人的政策在趋同之前就一直在发生变化。当代理人相互沟通的次数有限(即存在预算限制)时,使用行动作为建议的建议战略可能不够好。我们建议为合作性多试剂强化学习(MARL)建议一个partaker-Sharer咨询框架(PSAF),在预算限制下,我们分析建议性行动和分享Q值对代理人学习的影响。在PSAF中,每个Qlearner可以决定何时要求Q值并分享Q值。我们在三个典型的多试剂学习问题中进行实验。评价结果显示,我们的PSAF方法超越了预算限制和有限的现有咨询方法,我们分析建议性行动和分享Q值对代理人学习的影响。

0

相关内容

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

专知会员服务

16+阅读 · 2019年10月3日

【IJCAI 2019 | tutorial】双重机器学习 Dual Learning for Machine Learning，微软 | Tao Qin

【IJCAI 2019 | tutorial】双重机器学习 Dual Learning for Machine Learning，微软 | Tao Qin

专知会员服务

25+阅读 · 2019年8月12日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Arxiv

4+阅读 · 2018年4月22日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

多代理人模型

相关VIP内容

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

【强化学习研讨会|Microsoft Research】安全公平的机器学习（Safe and Fair Machine Learning）

专知会员服务

16+阅读 · 2019年10月3日

【IJCAI 2019 | tutorial】双重机器学习 Dual Learning for Machine Learning，微软 | Tao Qin

【IJCAI 2019 | tutorial】双重机器学习 Dual Learning for Machine Learning，微软 | Tao Qin

专知会员服务

25+阅读 · 2019年8月12日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Hierarchical Deep Multiagent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning

Arxiv

8+阅读 · 2018年9月25日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Arxiv

4+阅读 · 2018年4月22日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员