非同步的、基于备选方案的多机构政策渐进式:有条件的理由解释方法 (Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach) - 专知论文

会员服务 ·

0

Agent · 控制器 · 样本 · 机器人 · Robot ·

2023 年 1 月 19 日

Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach

翻译：非同步的、基于备选方案的多机构政策渐进式:有条件的理由解释方法

Xubo Lyu,Amin Banitalebi-Dehkordi,Mo Chen,Yong Zhang

from arxiv, Submitted to ICRA2023

Multi-agent policy gradient methods have demonstrated success in games and robotics but are often limited to problems with low-level action space. However, when agents take higher-level, temporally-extended actions (i.e. options), when and how to derive a centralized control policy, its gradient as well as sampling options for all agents while not interrupting current option executions, becomes a challenge. This is mostly because agents may choose and terminate their options \textit{asynchronously}. In this work, we propose a conditional reasoning approach to address this problem, and empirically validate its effectiveness on representative option-based multi-agent cooperative tasks.

翻译：多试剂政策梯度方法在游戏和机器人方面已经证明是成功的,但往往局限于低行动空间的问题,然而,当代理商采取较高层次的、时间上延伸的行动(即选择方案),何时以及如何制定集中控制政策时,其梯度以及对所有代理商的抽样选择方案就成为一个挑战,而同时又不打断目前的选择方案处决。这主要是因为代理商可以选择和终止其选择方案 \ textit{asoncronoy} 。在这项工作中,我们提出了一个有条件的推理方法来解决这一问题,并用经验验证其在具有代表性的基于选择方案的多试剂合作任务上的有效性。

0

相关内容

Agent

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

新型稀土掺杂氮氧化物发光材料的合成与光谱调控的研究

国家自然科学基金

0+阅读 · 2013年12月31日

GaAs被动调Q的温度效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

泥页岩和煤中纳米孔隙的形成、演化和油气富集机制

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

微流控芯片用于快速筛选蛋白质的核酸适体

国家自然科学基金

0+阅读 · 2011年12月31日

基于工程化蛋白质单分子分析元件的新一代DNA测序技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

金属旋压成形中的损伤演化和破裂机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于胶体量子点的白光混合LED的研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

Traffic Prediction with Transfer Learning: A Mutual Information-based Approach

Arxiv

0+阅读 · 2023年3月13日

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

Arxiv

0+阅读 · 2023年3月12日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2023年3月12日

Robust MADER: Decentralized Multiagent Trajectory Planner Robust to Communication Delay in Dynamic Environments

Arxiv

0+阅读 · 2023年3月10日

GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving

Arxiv

0+阅读 · 2023年3月10日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

Reasoning on Knowledge Graphs with Debate Dynamics

Reasoning on Knowledge Graphs with Debate Dynamics

Arxiv

14+阅读 · 2020年1月2日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Traffic Prediction with Transfer Learning: A Mutual Information-based Approach

Arxiv

0+阅读 · 2023年3月13日

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

Arxiv

0+阅读 · 2023年3月12日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2023年3月12日

Robust MADER: Decentralized Multiagent Trajectory Planner Robust to Communication Delay in Dynamic Environments

Arxiv

0+阅读 · 2023年3月10日

GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving

Arxiv

0+阅读 · 2023年3月10日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

Reasoning on Knowledge Graphs with Debate Dynamics

Reasoning on Knowledge Graphs with Debate Dynamics

Arxiv

14+阅读 · 2020年1月2日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

相关基金

新型稀土掺杂氮氧化物发光材料的合成与光谱调控的研究

国家自然科学基金

0+阅读 · 2013年12月31日

GaAs被动调Q的温度效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

泥页岩和煤中纳米孔隙的形成、演化和油气富集机制

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

微流控芯片用于快速筛选蛋白质的核酸适体

国家自然科学基金

0+阅读 · 2011年12月31日

基于工程化蛋白质单分子分析元件的新一代DNA测序技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

金属旋压成形中的损伤演化和破裂机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于胶体量子点的白光混合LED的研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员