真实的自我扮演 (Truthful Self-Play) - 专知论文

会员服务 ·

0

Self-Play · INFORMS · 回合 · Agent · Learning ·

2023 年 2 月 2 日

Truthful Self-Play

翻译：真实的自我扮演

from arxiv, Accepted for publication at ICLR 2023

We present a general framework for evolutionary learning to emergent unbiased state representation without any supervision. Evolutionary frameworks such as self-play converge to bad local optima in case of multi-agent reinforcement learning in non-cooperative partially observable environments with communication due to information asymmetry. Our proposed framework is a simple modification of self-play inspired by mechanism design, also known as {\em reverse game theory}, to elicit truthful signals and make the agents cooperative. The key idea is to add imaginary rewards using the peer prediction method, i.e., a mechanism for evaluating the validity of information exchanged between agents in a decentralized environment. Numerical experiments with predator prey, traffic junction and StarCraft tasks demonstrate that the state-of-the-art performance of our framework.

翻译：我们提出了一个在没有任何监督的情况下进行进化学习以创造出公正的国家代表性的总体框架。进化框架,如自我游戏,在由于信息不对称而进行交流的不合作的半可观测环境中进行多试剂强化学习,如果多试剂强化学习,在多试剂强化学习,因为信息不对称。我们提议的框架是简单的修改由机制设计所启发的自我游戏,也称为“反向游戏理论 ”, 以获得真实信号并使代理人合作。关键的想法是使用同行预测方法添加假想的奖励,即一种评估分散环境中代理人之间交流信息有效性的机制。与捕食者、交通连接点和StarCraft任务进行的数字实验表明我们框架的最新表现。

1

相关内容

Self-Play

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

原位构筑三维多孔石墨烯化学固定纳米硫化锂复合材料及结构性能调控

国家自然科学基金

0+阅读 · 2015年12月31日

线粒体定位的MICAL2基因选择性剪接体调控肺癌细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

奶牛乳腺脂类合成代谢转录调控机制与基因网络构建

国家自然科学基金

0+阅读 · 2014年12月31日

小麦及其野生近缘植物精氨酸酶基因克隆及分子特性分析

国家自然科学基金

0+阅读 · 2013年12月31日

微生物燃料电池中微生物与电极作用机制及产电性能强化研究

国家自然科学基金

0+阅读 · 2012年12月31日

高性能导电聚苯胺/活化石墨烯复合物微球的构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

奶牛泌乳和乳腺退化的5-HT调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

无线传感器/执行器网络的协同估计和协调控制

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

催化裂解制备超细单壁碳纳米管及光学性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Broken Neural Scaling Laws

Arxiv

0+阅读 · 2023年3月27日

Egocentric Audio-Visual Object Localization

Arxiv

0+阅读 · 2023年3月23日

Inverse-Dynamics MPC via Nullspace Resolution

Arxiv

0+阅读 · 2023年3月23日

Posthoc Interpretation via Quantization

Arxiv

0+阅读 · 2023年3月22日

Unbiased Supervised Contrastive Learning

Arxiv

0+阅读 · 2023年3月22日

Structured barycentric forms for interpolation-based data-driven reduced modeling of second-order systems

Arxiv

0+阅读 · 2023年3月22日

A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning

Arxiv

0+阅读 · 2023年3月9日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

A Structured Self-attentive Sentence Embedding

Arxiv

24+阅读 · 2017年3月9日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Broken Neural Scaling Laws

Arxiv

0+阅读 · 2023年3月27日

Egocentric Audio-Visual Object Localization

Arxiv

0+阅读 · 2023年3月23日

Inverse-Dynamics MPC via Nullspace Resolution

Arxiv

0+阅读 · 2023年3月23日

Posthoc Interpretation via Quantization

Arxiv

0+阅读 · 2023年3月22日

Unbiased Supervised Contrastive Learning

Arxiv

0+阅读 · 2023年3月22日

Structured barycentric forms for interpolation-based data-driven reduced modeling of second-order systems

Arxiv

0+阅读 · 2023年3月22日

A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning

Arxiv

0+阅读 · 2023年3月9日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

A Structured Self-attentive Sentence Embedding

Arxiv

24+阅读 · 2017年3月9日

相关基金

原位构筑三维多孔石墨烯化学固定纳米硫化锂复合材料及结构性能调控

国家自然科学基金

0+阅读 · 2015年12月31日

线粒体定位的MICAL2基因选择性剪接体调控肺癌细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

奶牛乳腺脂类合成代谢转录调控机制与基因网络构建

国家自然科学基金

0+阅读 · 2014年12月31日

小麦及其野生近缘植物精氨酸酶基因克隆及分子特性分析

国家自然科学基金

0+阅读 · 2013年12月31日

微生物燃料电池中微生物与电极作用机制及产电性能强化研究

国家自然科学基金

0+阅读 · 2012年12月31日

高性能导电聚苯胺/活化石墨烯复合物微球的构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

奶牛泌乳和乳腺退化的5-HT调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

无线传感器/执行器网络的协同估计和协调控制

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

催化裂解制备超细单壁碳纳米管及光学性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员