A Policy Gradient Method for Confounded POMDPs - 专知论文

会员服务 ·

0

估计/估计量 · 矩 · 情景 · 部分可观测马尔可夫决策过程 · 广义函数 ·

2023 年 5 月 26 日

A Policy Gradient Method for Confounded POMDPs

翻译：暂无翻译

Mao Hong,Zhengling Qi,Yanxun Xu

from arxiv, 84 pages, 1 figure

In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.

翻译：暂无翻译

0

相关内容

估计/估计量

估计/估计量

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

老年黄韧带骨化症软骨细胞miR24成骨相关的分子网络及机制探究

国家自然科学基金

0+阅读 · 2014年12月31日

MFHAS1通过ERK信号转导通路对脓毒症小鼠T淋巴细胞的作用及机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

分布参数系统的迭代学习控制及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

荧光敏感膦酸盐的设计合成与结构调控

国家自然科学基金

0+阅读 · 2011年12月31日

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Arxiv

0+阅读 · 2023年7月17日

Zero-Shot Image Harmonization with Generative Model Prior

Arxiv

0+阅读 · 2023年7月17日

A Data Fusion Method for Quantile Treatment Effects

Arxiv

0+阅读 · 2023年7月16日

Leveraging Factored Action Spaces for Off-Policy Evaluation

Arxiv

0+阅读 · 2023年7月13日

Robust online active learning

Arxiv

0+阅读 · 2023年7月13日

VIP会员

文章信息

相关主题

估计/估计量

部分可观测马尔可夫决策过程

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】从推理服务到模型训练：面向大规模 LLM 智能体的高效系统构建

面向作战人员负责任地寻求生成式人工智能

《Hello-Agents》项目正式发布，一起从零学习智能体！

智能体 AI (Agentic AI) 的新进展：回归初心，预见未来

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Arxiv

0+阅读 · 2023年7月17日

Zero-Shot Image Harmonization with Generative Model Prior

Arxiv

0+阅读 · 2023年7月17日

A Data Fusion Method for Quantile Treatment Effects

Arxiv

0+阅读 · 2023年7月16日

Leveraging Factored Action Spaces for Off-Policy Evaluation

Arxiv

0+阅读 · 2023年7月13日

Robust online active learning

Arxiv

0+阅读 · 2023年7月13日

相关基金

老年黄韧带骨化症软骨细胞miR24成骨相关的分子网络及机制探究

国家自然科学基金

0+阅读 · 2014年12月31日

MFHAS1通过ERK信号转导通路对脓毒症小鼠T淋巴细胞的作用及机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

分布参数系统的迭代学习控制及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

荧光敏感膦酸盐的设计合成与结构调控

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员