使用JESP内部的有限国家主计长解决无穷度方略 Dec-POMDP (Solving infinite-horizon Dec-POMDPs using Finite State Controllers within JESP) - 专知论文

会员服务 ·

0

控制器 · state-of-the-art · AIM · 情景 · 人工智能 ·

2021 年 9 月 17 日

Solving infinite-horizon Dec-POMDPs using Finite State Controllers within JESP

翻译：使用JESP内部的有限国家主计长解决无穷度方略 Dec-POMDP

Yang You,Vincent Thomas,Francis Colas,Olivier Buffet

from arxiv, Extended version of ICTAI 2021 paper

This paper looks at solving collaborative planning problems formalized as Decentralized POMDPs (Dec-POMDPs) by searching for Nash equilibria, i.e., situations where each agent's policy is a best response to the other agents' (fixed) policies. While the Joint Equilibrium-based Search for Policies (JESP) algorithm does this in the finite-horizon setting relying on policy trees, we propose here to adapt it to infinite-horizon Dec-POMDPs by using finite state controller (FSC) policy representations. In this article, we (1) explain how to turn a Dec-POMDP with $N-1$ fixed FSCs into an infinite-horizon POMDP whose solution is an $N^\text{th}$ agent best response; (2) propose a JESP variant, called \infJESP, using this to solve infinite-horizon Dec-POMDPs; (3) introduce heuristic initializations for JESP aiming at leading to good solutions; and (4) conduct experiments on state-of-the-art benchmark problems to evaluate our approach.

翻译：本文探讨通过寻找纳什平衡(即每个代理人的政策是对其他代理人(固定)政策的最佳反应),解决作为分散式POMDP(Dec-POMDPs)正式化的合作规划问题。尽管基于联合平衡的搜索政策(JESP)算法在依赖政策树的有限视距设置中这样做,我们在此建议利用有限的州控制员(FSC)的政策说明,将其调整为无限偏差的Dec-POMDPs。在本条中,我们(1)解释如何将一个以1美元固定FSCs为单位的Dec-POMDP转换成一个以1美元固定FSCs为单位的无限偏差式POMDP,其解决办法是最佳反应;(2) 提出一个称为\infJESP的JESP变方,利用它来解决无限偏差Dec-POMDPs;(3)为JESP引入旨在找到良好解决办法的超度初始初始化初始化概念;以及(4)对状态基准问题进行实验,以评价我们的方法。

0

相关内容

控制器

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【KDD2019|讲座推荐】在线控制实验结果评估的挑战、最佳实践和陷阱：Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments

【KDD2019|讲座推荐】在线控制实验结果评估的挑战、最佳实践和陷阱：Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments

专知会员服务

4+阅读 · 2019年12月4日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【综述】多智能体强化学习算法理论研究

【综述】多智能体强化学习算法理论研究

深度强化学习实验室

15+阅读 · 2020年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

A nonconforming finite element method for an elliptic optimal control problem with constraint on the gradient

Arxiv

0+阅读 · 2021年12月10日

Scalable and Decentralized Algorithms for Anomaly Detection via Learning-Based Controlled Sensing

Arxiv

0+阅读 · 2021年12月8日

Online Elicitation of Necessarily Optimal Matchings

Arxiv

0+阅读 · 2021年12月8日

Learning over All Stabilizing Nonlinear Controllers for a Partially-Observed Linear System

Arxiv

0+阅读 · 2021年12月8日

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

Arxiv

0+阅读 · 2021年12月5日

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks with Base Controllers

Arxiv

0+阅读 · 2021年12月4日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

A survey on policy search algorithms for learning robot controllers in a handful of trials

Arxiv

3+阅读 · 2018年7月6日

A Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking

A Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking

Arxiv

4+阅读 · 2018年7月5日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【KDD2019|讲座推荐】在线控制实验结果评估的挑战、最佳实践和陷阱：Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments

【KDD2019|讲座推荐】在线控制实验结果评估的挑战、最佳实践和陷阱：Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments

专知会员服务

4+阅读 · 2019年12月4日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【综述】多智能体强化学习算法理论研究

【综述】多智能体强化学习算法理论研究

深度强化学习实验室

15+阅读 · 2020年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

相关论文

A nonconforming finite element method for an elliptic optimal control problem with constraint on the gradient

Arxiv

0+阅读 · 2021年12月10日

Scalable and Decentralized Algorithms for Anomaly Detection via Learning-Based Controlled Sensing

Arxiv

0+阅读 · 2021年12月8日

Online Elicitation of Necessarily Optimal Matchings

Arxiv

0+阅读 · 2021年12月8日

Learning over All Stabilizing Nonlinear Controllers for a Partially-Observed Linear System

Arxiv

0+阅读 · 2021年12月8日

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

Arxiv

0+阅读 · 2021年12月5日

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks with Base Controllers

Arxiv

0+阅读 · 2021年12月4日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

A survey on policy search algorithms for learning robot controllers in a handful of trials

Arxiv

3+阅读 · 2018年7月6日

A Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking

A Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking

Arxiv

4+阅读 · 2018年7月5日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员