优化 MLE -- -- 部分可部分遵守的序列决策通用模型定值 (Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making) - 专知论文

会员服务 ·

0

易处理的 · Learning · 极大似然估计 · 秩 · Continuity ·

2022 年 9 月 29 日

Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

翻译：优化 MLE -- -- 部分可部分遵守的序列决策通用模型定值

Qinghua Liu,Praneeth Netrapalli,Csaba Szepesvari,Chi Jin

This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples. This rich class includes not only a majority of known tractable model-based Reinforcement Learning (RL) problems (such as tabular MDPs, factored MDPs, low witness rank problems, tabular weakly-revealing/observable POMDPs and multi-step decodable POMDPs), but also many new challenging RL problems especially in the partially observable setting that were not previously known to be tractable. Notably, the new problems addressed by this paper include (1) observable POMDPs with continuous observation and function approximation, where we achieve the first sample complexity that is completely independent of the size of observation space; (2) well-conditioned low-rank sequential decision making problems (also known as Predictive State Representations (PSRs)), which include and generalize all known tractable POMDP examples under a more intrinsic representation; (3) general sequential decision making problems under SAIL condition, which unifies our existing understandings of model-based RL in both fully observable and partially observable settings. SAIL condition is identified by this paper, which can be viewed as a natural generalization of Bellman/witness rank to address partial observability.

翻译：本文为一般顺序决策引入了一种简单的高效学习算法。算法将最佳探索主义与模型估计的最大相似性估计( 简称OMLE ) 结合起来。我们证明, OMLE在多样本中学习了极富的顺序决策问题的近乎最佳的政策。这个丰富类别不仅包括大多数已知的基于模型的强化学习( RL)问题( 如表格式 MDPs、系数式 MDPs、低证人级别问题、列表式弱反应/ 可见性POMDPs 和多步式可变可变POMDPs ), 但也包括许多具有挑战性的RL( ) 问题, 特别是以前所不为人们所知的部分观察性环境。值得注意的是, 本文处理的新问题包括:(1) 具有持续观察和功能近似性的POMDPs, 我们第一次获得完全独立于观测空间大小的样本复杂性; (2) 低级别顺序决策问题( 也称为预测性国家模型/ 可观察性POMDPs 和多步式可调的POMDP ) ), 其中部分地将我们所了解的常规性常态性常识性常识化的RIL 条件下的所有常识性问题纳入性。

0

相关内容

易处理的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

靶向抑制 MNK-eIF4E 轴增效TRAIL治疗鼻咽癌的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Dectin-1受体识别的酵母葡聚糖酶解片段的链结构及构效关系的研究

国家自然科学基金

0+阅读 · 2013年12月31日

PGC-1α调节骨骼肌脂肪酸代谢和胰岛素抵抗的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRAIL在动脉粥样硬化发生发展中作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

靶向乳腺癌干细胞导入mir-30a抑制乳腺癌转移的研究

国家自然科学基金

0+阅读 · 2009年12月31日

肝移植后缺血性胆道病变超声造影早期诊断的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

Improving the Predictive Performances of $k$ Nearest Neighbors Learning by Efficient Variable Selection

Improving the Predictive Performances of $k$ Nearest Neighbors Learning by Efficient Variable Selection

Arxiv

0+阅读 · 2022年11月4日

Collaborative Multiobjective Evolutionary Algorithms in search of better Pareto Fronts. An application to trading systems

Arxiv

0+阅读 · 2022年11月4日

Applications of transcendental number theory to decision problems for hypergeometric sequences

Arxiv

0+阅读 · 2022年11月4日

Leveraging Fully Observable Policies for Learning under Partial Observability

Arxiv

0+阅读 · 2022年11月3日

A Posterior Sampling Framework for Interactive Decision Making

A Posterior Sampling Framework for Interactive Decision Making

Arxiv

0+阅读 · 2022年11月3日

Quantaloidal Approach to Constraint Satisfaction

Arxiv

0+阅读 · 2022年11月3日

Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments

Arxiv

0+阅读 · 2022年11月3日

Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

Arxiv

0+阅读 · 2022年11月1日

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Arxiv

0+阅读 · 2022年11月1日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

VIP会员

文章信息

相关主题

极大似然估计

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Improving the Predictive Performances of $k$ Nearest Neighbors Learning by Efficient Variable Selection

Improving the Predictive Performances of $k$ Nearest Neighbors Learning by Efficient Variable Selection

Arxiv

0+阅读 · 2022年11月4日

Collaborative Multiobjective Evolutionary Algorithms in search of better Pareto Fronts. An application to trading systems

Arxiv

0+阅读 · 2022年11月4日

Applications of transcendental number theory to decision problems for hypergeometric sequences

Arxiv

0+阅读 · 2022年11月4日

Leveraging Fully Observable Policies for Learning under Partial Observability

Arxiv

0+阅读 · 2022年11月3日

A Posterior Sampling Framework for Interactive Decision Making

A Posterior Sampling Framework for Interactive Decision Making

Arxiv

0+阅读 · 2022年11月3日

Quantaloidal Approach to Constraint Satisfaction

Arxiv

0+阅读 · 2022年11月3日

Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments

Arxiv

0+阅读 · 2022年11月3日

Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

Arxiv

0+阅读 · 2022年11月1日

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Arxiv

0+阅读 · 2022年11月1日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

靶向抑制 MNK-eIF4E 轴增效TRAIL治疗鼻咽癌的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Dectin-1受体识别的酵母葡聚糖酶解片段的链结构及构效关系的研究

国家自然科学基金

0+阅读 · 2013年12月31日

PGC-1α调节骨骼肌脂肪酸代谢和胰岛素抵抗的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRAIL在动脉粥样硬化发生发展中作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

靶向乳腺癌干细胞导入mir-30a抑制乳腺癌转移的研究

国家自然科学基金

0+阅读 · 2009年12月31日

肝移植后缺血性胆道病变超声造影早期诊断的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员