以州-州行动空间无内存的随机政策解决无限光速POMDPs (Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space) - 专知论文

会员服务 ·

0

随机性策略 · 优化器 · 部分可观测马尔可夫决策过程 · 线性的 · Processing（编程语言） ·

2022 年 5 月 27 日

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

翻译：以州-州行动空间无内存的随机政策解决无限光速POMDPs

Johannes Müller,Guido Montúfar

from arxiv, Accepted as an extended abstract at RLDM 2022, 5 pages, 2 figures

Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies. Taking a similar perspective in the case of partially observable Markov decision processes with memoryless stochastic policies, the problem was recently formulated as the optimization of a linear objective subject to polynomial constraints. Based on this we present an approach for Reward Optimization in State-Action space (ROSA). We test this approach experimentally in maze navigation tasks. We find that ROSA is computationally efficient and can yield stability improvements over other existing methods.

翻译：在完全可观测到的Markov决策程序中的奖励优化相当于在州行动频率的多元范围上的一个线性程序。从部分可观测到的Markov决策程序与无内存的随机政策的类似观点来看,这个问题最近被作为线性目标的优化提出来,但受到多种限制。在此基础上,我们提出了一个国家行动空间(ROSA)的奖励优化方法。我们在迷宫导航任务中实验了这一方法。我们发现ROSA在计算上效率很高,并且能够比其他现有方法更稳定。

0

相关内容

随机性策略

随机性策略

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

广义Lorenz系统族解的有界性研究

国家自然科学基金

0+阅读 · 2015年12月31日

PSD-95/kalirin-7/Rac1信号通路在七氟烷致幼期大鼠远期学习记忆能力损害中的作用

国家自然科学基金

0+阅读 · 2015年12月31日

去泛素化酶USP4调节SMAD4蛋白单泛素化并调控TGF-β/Activin信号的研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向信道均衡的仿射投影算法非维纳解研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRPV4在Aβ诱导星形胶质细胞活化及介导神经元死亡中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

由布朗运动和分数布朗运动驱动的一类随机控制问题及应用

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic Environments

Arxiv

0+阅读 · 2022年7月15日

PAC Reinforcement Learning for Predictive State Representations

PAC Reinforcement Learning for Predictive State Representations

Arxiv

0+阅读 · 2022年7月15日

High order geometric methods with splines: an analysis of discrete Hodge--star operators

Arxiv

0+阅读 · 2022年7月15日

Linear prediction of point process times and marks

Arxiv

0+阅读 · 2022年7月15日

Optimal No-regret Learning in Repeated First-price Auctions

Arxiv

0+阅读 · 2022年7月15日

Double Loop Monte Carlo Estimator with Importance Sampling for McKean-Vlasov Stochastic Differential Equation

Arxiv

0+阅读 · 2022年7月14日

Fully Decentralized Model-based Policy Optimization for Networked Systems

Arxiv

0+阅读 · 2022年7月13日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

VIP会员

文章信息

相关主题

随机性策略

部分可观测马尔可夫决策过程

Processing（编程语言）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic Environments

Arxiv

0+阅读 · 2022年7月15日

PAC Reinforcement Learning for Predictive State Representations

PAC Reinforcement Learning for Predictive State Representations

Arxiv

0+阅读 · 2022年7月15日

High order geometric methods with splines: an analysis of discrete Hodge--star operators

Arxiv

0+阅读 · 2022年7月15日

Linear prediction of point process times and marks

Arxiv

0+阅读 · 2022年7月15日

Optimal No-regret Learning in Repeated First-price Auctions

Arxiv

0+阅读 · 2022年7月15日

Double Loop Monte Carlo Estimator with Importance Sampling for McKean-Vlasov Stochastic Differential Equation

Arxiv

0+阅读 · 2022年7月14日

Fully Decentralized Model-based Policy Optimization for Networked Systems

Arxiv

0+阅读 · 2022年7月13日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

广义Lorenz系统族解的有界性研究

国家自然科学基金

0+阅读 · 2015年12月31日

PSD-95/kalirin-7/Rac1信号通路在七氟烷致幼期大鼠远期学习记忆能力损害中的作用

国家自然科学基金

0+阅读 · 2015年12月31日

去泛素化酶USP4调节SMAD4蛋白单泛素化并调控TGF-β/Activin信号的研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向信道均衡的仿射投影算法非维纳解研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRPV4在Aβ诱导星形胶质细胞活化及介导神经元死亡中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

由布朗运动和分数布朗运动驱动的一类随机控制问题及应用

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员