POMDP规划的适应性信仰差异化 (Adaptive Belief Discretization for POMDP Planning) - 专知论文

会员服务 ·

0

离散化 · 部分可观测马尔可夫决策过程 · 确切的 · 回合 · INTERACT ·

2021 年 4 月 15 日

Adaptive Belief Discretization for POMDP Planning

翻译：POMDP规划的适应性信仰差异化

Divya Grover,Christos Dimitrakakis

Partially Observable Markov Decision Processes (POMDP) is a widely used model to represent the interaction of an environment and an agent, under state uncertainty. Since the agent does not observe the environment state, its uncertainty is typically represented through a probabilistic belief. While the set of possible beliefs is infinite, making exact planning intractable, the belief space's complexity (and hence planning complexity) is characterized by its covering number. Many POMDP solvers uniformly discretize the belief space and give the planning error in terms of the (typically unknown) covering number. We instead propose an adaptive belief discretization scheme, and give its associated planning error. We furthermore characterize the covering number with respect to the POMDP parameters. This allows us to specify the exact memory requirements on the planner, needed to bound the value function error. We then propose a novel, computationally efficient solver using this scheme. We demonstrate that our algorithm is highly competitive with the state of the art in a variety of scenarios.

翻译：部分可观察的 Markov 决策程序(POMDP) 是一个广泛使用的模型,用来代表国家不确定情况下的环境和代理人的相互作用。由于代理人不观察环境状态,其不确定性通常通过概率性信念来表示。虽然一套可能的信念是无限的,使得精确的规划难以实现,但信仰空间的复杂性(因此也是规划的复杂性)的特点是其覆盖号。许多POMDP 解答器统一了信仰空间的离散,并给出了覆盖数字(通常未知)方面的规划错误。我们相反地提出了一个适应性信仰分离计划,并给出了相关的规划错误。我们用POMDP参数来描述覆盖数字的特性。这使我们能够确定规划者的确切记忆要求,从而约束价值函数错误。我们然后用这个计划提出一个新的、计算高效的解决方案。我们证明我们的算法在各种情景中与艺术状态具有高度竞争力。

0

相关内容

离散化

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

55+阅读 · 2020年8月28日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning in Continuous Spaces

Arxiv

0+阅读 · 2021年6月8日

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Arxiv

0+阅读 · 2021年6月8日

Deciding What to Learn: A Rate-Distortion Approach

Arxiv

0+阅读 · 2021年6月7日

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Arxiv

0+阅读 · 2021年6月7日

Tunable Trajectory Planner Using G3 Curves

Tunable Trajectory Planner Using G3 Curves

Arxiv

0+阅读 · 2021年6月7日

Distributed Inference with Sparse and Quantized Communication

Arxiv

0+阅读 · 2021年6月7日

What if we Increase the Number of Objectives? Theoretical and Empirical Implications for Many-objective Optimization

Arxiv

0+阅读 · 2021年6月6日

Coarse-Grid Selection Using Simulated Annealing

Arxiv

0+阅读 · 2021年6月5日

Trajectory Optimization of Chance-Constrained Nonlinear Stochastic Systems for Motion Planning and Control

Arxiv

0+阅读 · 2021年6月5日

Adversarial Attacks on Optimization based Planners

Arxiv

0+阅读 · 2021年6月4日

VIP会员

文章信息

相关主题

部分可观测马尔可夫决策过程

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

55+阅读 · 2020年8月28日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning in Continuous Spaces

Arxiv

0+阅读 · 2021年6月8日

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Arxiv

0+阅读 · 2021年6月8日

Deciding What to Learn: A Rate-Distortion Approach

Arxiv

0+阅读 · 2021年6月7日

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Arxiv

0+阅读 · 2021年6月7日

Tunable Trajectory Planner Using G3 Curves

Tunable Trajectory Planner Using G3 Curves

Arxiv

0+阅读 · 2021年6月7日

Distributed Inference with Sparse and Quantized Communication

Arxiv

0+阅读 · 2021年6月7日

What if we Increase the Number of Objectives? Theoretical and Empirical Implications for Many-objective Optimization

Arxiv

0+阅读 · 2021年6月6日

Coarse-Grid Selection Using Simulated Annealing

Arxiv

0+阅读 · 2021年6月5日

Trajectory Optimization of Chance-Constrained Nonlinear Stochastic Systems for Motion Planning and Control

Arxiv

0+阅读 · 2021年6月5日

Adversarial Attacks on Optimization based Planners

Arxiv

0+阅读 · 2021年6月4日

微信扫码咨询专知VIP会员