从次优示范中学习团队政策半模拟模拟学习 (Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations) - 专知论文

会员服务 ·

0

Learning · TEAM · Performer · Markovian · 讲稿 ·

2022 年 6 月 13 日

Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

翻译：从次优示范中学习团队政策半模拟模拟学习

Sangwon Seo,Vaibhav V. Unhelkar

from arxiv, Extended version of an identically-titled paper accepted at IJCAI 2022

We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model the behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.

翻译：我们介绍贝叶斯团队模拟学习者(BTIL),这是一种模拟学习算法,用以模拟在马尔科维亚地区执行相继任务的团队的行为。与现有的多剂模拟学习技术相比,BTIL明确模型并推断了团队成员具有时间变化的心理状态,从而能够从次优团队协作的示范中学习分散的团队政策。此外,为了从小型数据集中学习样本和标签效率高的政策,BTIL采用了巴伊西亚视角,能够从半监督的演示中学习。我们展示并衡量了BTIL在合成多剂任务方面的表现以及人类代理团队合作的新数据集。我们的实验表明,尽管团队成员(时间变化和可能出现错乱的)心理状态对其行为产生了影响,但BTIL仍然能够成功地从演示中学习团队政策。

0

相关内容

Learning

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

光解水制氢纳米光催化材料的表界面基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cupriavidus basilensis B-8 对木质素降解机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于透明导电氧化物的表面等离激元电光调制技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于A-Train卫星观测的沙尘暴数字重构技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维形貌原位数字全息测量的物理行为建模与粒子量化

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

大型特殊矩阵的降阶和特征值分布

国家自然科学基金

0+阅读 · 2009年12月31日

相依误差下时间序列模型的统计推断

国家自然科学基金

0+阅读 · 2009年12月31日

基于多特征情感信息融合的高效率e-Learning关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

STEADY: Simultaneous State Estimation and Dynamics Learning from Indirect Observations

Arxiv

0+阅读 · 2022年8月3日

Multi-Scale User Behavior Network for Entire Space Multi-Task Learning

Arxiv

0+阅读 · 2022年8月3日

Learning to Navigate using Visual Sensor Networks

Arxiv

0+阅读 · 2022年8月1日

Bayesian Active Learning for Sim-to-Real Robotic Perception

Arxiv

0+阅读 · 2022年8月1日

A System for Imitation Learning of Contact-Rich Bimanual Manipulation Policies

Arxiv

0+阅读 · 2022年8月1日

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

Arxiv

0+阅读 · 2022年7月31日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型幻觉：系统综述

《分析与预测陆军战斗体能测试表现：统计与机器学习方法》2025最新137页

【博士论文】数据与任务的物理学：深度学习中的局部性与组合性理论

代理式人工智能时代的决策优势

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

STEADY: Simultaneous State Estimation and Dynamics Learning from Indirect Observations

Arxiv

0+阅读 · 2022年8月3日

Multi-Scale User Behavior Network for Entire Space Multi-Task Learning

Arxiv

0+阅读 · 2022年8月3日

Learning to Navigate using Visual Sensor Networks

Arxiv

0+阅读 · 2022年8月1日

Bayesian Active Learning for Sim-to-Real Robotic Perception

Arxiv

0+阅读 · 2022年8月1日

A System for Imitation Learning of Contact-Rich Bimanual Manipulation Policies

Arxiv

0+阅读 · 2022年8月1日

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

Arxiv

0+阅读 · 2022年7月31日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

相关基金

光解水制氢纳米光催化材料的表界面基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cupriavidus basilensis B-8 对木质素降解机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于透明导电氧化物的表面等离激元电光调制技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于A-Train卫星观测的沙尘暴数字重构技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维形貌原位数字全息测量的物理行为建模与粒子量化

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

大型特殊矩阵的降阶和特征值分布

国家自然科学基金

0+阅读 · 2009年12月31日

相依误差下时间序列模型的统计推断

国家自然科学基金

0+阅读 · 2009年12月31日

基于多特征情感信息融合的高效率e-Learning关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员