通过Markov 决策程序进行沟通 (Communicating via Markov Decision Processes) - 专知论文

会员服务 ·

0

Markov · 极小点 · Performer · Processing（编程语言） · 代码 ·

2022 年 6 月 12 日

Communicating via Markov Decision Processes

翻译：通过Markov 决策程序进行沟通

Samuel Sokota,Christian Schroeder de Witt,Maximilian Igl,Luisa Zintgraf,Philip Torr,Martin Strohmeier,J. Zico Kolter,Shimon Whiteson,Jakob Foerster

from arxiv, ICML 2022

We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.

翻译：我们考虑的是通过Markov决定过程轨迹来传播外部信息的问题。我们称之为Markov编码游戏(MCG),这种设置将源码编码和一大批优惠游戏普遍化。 MCG还孤立了一个在分散控制环境中很重要的问题,在这种分散控制环境中,没有廉价的谈话手段,也就是说,它们需要平衡通信与相关通信成本之间的平衡。我们根据最大增压学习和我们称之为MEME的最小增压连接,为MCG提供了一种基于理论基础的方法。由于最近在最小增压组合的近似算法方面的突破,MEME不仅仅是一种理论算法,而且可以应用于实用环境。我们生动地表明,MEME能够超越小的MG的强大基线,而且ME能够在极为庞大的MCG上取得强大的性能。后一点是,我们证明ME能够通过卡托尔和彭的轨迹谱轨迹,不遗漏地传递二进图象,同时实现最大或最接近的噪音的出现,同时在最接近的状态下进行。

0

相关内容

Markov

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

动力学涨落对网络结构的影响

国家自然科学基金

0+阅读 · 2015年12月31日

受Mittag-Lef？er噪声激励的广义朗之万方程的随机共振研究

国家自然科学基金

0+阅读 · 2015年12月31日

城市群空间交互情景分析与多尺度协同模拟

国家自然科学基金

0+阅读 · 2014年12月31日

分数阶快-慢神经元模型簇放电活动的非线性特性

国家自然科学基金

0+阅读 · 2013年12月31日

图的几类标号问题

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

柱形区域上变系数椭圆方程Cauchy问题的数值计算

国家自然科学基金

1+阅读 · 2011年12月31日

海洛因依赖大鼠VTA和海马神经元阿片受体及CREB表达与可塑性研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Arxiv

0+阅读 · 2022年8月3日

Joint Optimization for Secure and Reliable Communications in Finite Blocklength Regime

Arxiv

0+阅读 · 2022年8月3日

Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields

Arxiv

0+阅读 · 2022年8月3日

Human Aspect of Threat Analysis: A Replication

Arxiv

0+阅读 · 2022年8月2日

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Arxiv

0+阅读 · 2022年7月30日

Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates

Arxiv

0+阅读 · 2022年7月29日

Leveraging Expert Consistency to Improve Algorithmic Decision Support

Arxiv

0+阅读 · 2022年7月29日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Arxiv

0+阅读 · 2022年8月3日

Joint Optimization for Secure and Reliable Communications in Finite Blocklength Regime

Arxiv

0+阅读 · 2022年8月3日

Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields

Arxiv

0+阅读 · 2022年8月3日

Human Aspect of Threat Analysis: A Replication

Arxiv

0+阅读 · 2022年8月2日

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Arxiv

0+阅读 · 2022年7月30日

Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates

Arxiv

0+阅读 · 2022年7月29日

Leveraging Expert Consistency to Improve Algorithmic Decision Support

Arxiv

0+阅读 · 2022年7月29日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

相关基金

动力学涨落对网络结构的影响

国家自然科学基金

0+阅读 · 2015年12月31日

受Mittag-Lef？er噪声激励的广义朗之万方程的随机共振研究

国家自然科学基金

0+阅读 · 2015年12月31日

城市群空间交互情景分析与多尺度协同模拟

国家自然科学基金

0+阅读 · 2014年12月31日

分数阶快-慢神经元模型簇放电活动的非线性特性

国家自然科学基金

0+阅读 · 2013年12月31日

图的几类标号问题

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

柱形区域上变系数椭圆方程Cauchy问题的数值计算

国家自然科学基金

1+阅读 · 2011年12月31日

海洛因依赖大鼠VTA和海马神经元阿片受体及CREB表达与可塑性研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员