无限-Horizon POMDPs中无记忆斯托克政策优化的几何测量 (The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs) - 专知论文

会员服务 ·

0

随机性策略 · 优化器 · 部分可观测马尔可夫决策过程 · 网格世界 · 估计/估计量 ·

2022 年 4 月 29 日

The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

翻译：无限-Horizon POMDPs中无记忆斯托克政策优化的几何测量

Johannes Müller,Guido Montúfar

from arxiv, Camera ready version for ICLR 2022, 45 pages, 8 figures

We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we estimate the number of critical points and use the polynomial programming description of reward maximization to solve a navigation problem in a grid world.

翻译：我们考虑了如何找到最佳的无记忆的随机政策,以建立一个具有有限状态和行动空间的无限视点部分可观测的马尔科夫决策过程(POMDP),在折扣或平均奖励标准方面有一定的状态和行动空间。我们表明,(折扣的)州-行动频率和预期累积奖励是该政策的合理功能,其程度由部分可观察程度决定。然后我们将优化问题描述为受我们明确描述的多种制约的可行的州-行动频率空间的线性优化问题。这使我们能够利用最近多面优化的工具解决优化问题的组合和几何复杂问题。特别是,我们估计了临界点的数量,并使用奖励最大化的多面性方案描述来解决电网世界的航行问题。

0

相关内容

随机性策略

随机性策略

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

SiC MOSFET功率器件高速驱动研究

国家自然科学基金

0+阅读 · 2015年12月31日

化疗诱导的细胞衰老在神经母细胞瘤复发中的作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

酸雨区污染农田土壤重金属面源输出季节特征及驱动机制

国家自然科学基金

0+阅读 · 2013年12月31日

交互式Petri网及其兼容性研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

Si基子带跃迁中红外探测器研究

国家自然科学基金

0+阅读 · 2011年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

核转录因子STAT3对耐药基因MDR1的转录调控及其肿瘤化疗意义

国家自然科学基金

0+阅读 · 2009年12月31日

超宽带射频脉冲信号的全光处理基础研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型高稳定全光纤NICE-OHMS色散光谱技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime

Arxiv

0+阅读 · 2022年6月16日

Adaptive versus Static Multi-oracle Algorithms, and Quantum Security of a Split-key PRF

Arxiv

0+阅读 · 2022年6月16日

Risk-Averse No-Regret Learning in Online Convex Games

Arxiv

0+阅读 · 2022年6月15日

Parallel algorithms for power circuits and the word problem of the Baumslag group

Arxiv

0+阅读 · 2022年6月15日

Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications

Arxiv

0+阅读 · 2022年6月14日

Causal Discovery for Fairness

Arxiv

0+阅读 · 2022年6月14日

Linear average-case complexity of algorithmic problems in groups

Arxiv

0+阅读 · 2022年6月14日

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

Arxiv

0+阅读 · 2022年6月13日

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

Arxiv

0+阅读 · 2022年6月13日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

随机性策略

部分可观测马尔可夫决策过程

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime

Arxiv

0+阅读 · 2022年6月16日

Adaptive versus Static Multi-oracle Algorithms, and Quantum Security of a Split-key PRF

Arxiv

0+阅读 · 2022年6月16日

Risk-Averse No-Regret Learning in Online Convex Games

Arxiv

0+阅读 · 2022年6月15日

Parallel algorithms for power circuits and the word problem of the Baumslag group

Arxiv

0+阅读 · 2022年6月15日

Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications

Arxiv

0+阅读 · 2022年6月14日

Causal Discovery for Fairness

Arxiv

0+阅读 · 2022年6月14日

Linear average-case complexity of algorithmic problems in groups

Arxiv

0+阅读 · 2022年6月14日

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

Arxiv

0+阅读 · 2022年6月13日

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

Arxiv

0+阅读 · 2022年6月13日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

相关基金

SiC MOSFET功率器件高速驱动研究

国家自然科学基金

0+阅读 · 2015年12月31日

化疗诱导的细胞衰老在神经母细胞瘤复发中的作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

酸雨区污染农田土壤重金属面源输出季节特征及驱动机制

国家自然科学基金

0+阅读 · 2013年12月31日

交互式Petri网及其兼容性研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

Si基子带跃迁中红外探测器研究

国家自然科学基金

0+阅读 · 2011年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

核转录因子STAT3对耐药基因MDR1的转录调控及其肿瘤化疗意义

国家自然科学基金

0+阅读 · 2009年12月31日

超宽带射频脉冲信号的全光处理基础研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型高稳定全光纤NICE-OHMS色散光谱技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员