通过正规化动态方案规划进行最佳规划</s> (Optimistic Planning by Regularized Dynamic Programming) - 专知论文

会员服务 ·

0

dynamic programming · 正则化项 · 近似 · 线性的 · 泛函 ·

2023 年 3 月 3 日

Optimistic Planning by Regularized Dynamic Programming

翻译：通过正规化动态方案规划进行最佳规划

Antoine Moulin,Gergely Neu

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments that are typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear kernel MDPs from a single stream of experience, and show that it achieves near-optimal statistical guarantees.

翻译：我们基于在更新其他标准近似值迭代程序时增加正规化的构想,提出了在无限偏差贴现的Markov决策程序中进行乐观规划的新方法。这种方法使我们能够避免现有对近似动态编程方法的分析通常要求的收缩和单一性论点,特别是使用具有线性函数近似的MDP中通过最小平方程序估计的大致过渡功能。我们使用的方法提供了一种计算高效的算法,用于从单一的经验流中学习折扣线性线性内核 MDP 中接近最佳的政策,并显示它实现了近于最佳的统计保障。</s>

0

相关内容

dynamic programming

dynamic programming

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

87+阅读 · 2021年12月9日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

油菜中一个MYB转录因子调控ROS累积与抗逆的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

磷脂酶C-γ2（PLCG2）调节大鼠再生肝的肝细胞凋亡机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

考虑有限理性的供应链中断风险管理模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

STIM1突变与核浆钙信号调控

国家自然科学基金

0+阅读 · 2012年12月31日

冰云辐射性质参数化对东亚夏季风模拟的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化和去乙酰化对MRTF-A抗脑缺血诱导神经细胞凋亡的影响及机制

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

星载高光谱热红外数据的温度与发射率分离算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers

Arxiv

0+阅读 · 2023年4月25日

Ensemble Sampling

Arxiv

0+阅读 · 2023年4月25日

On Dynamic Program Decompositions of Static Risk Measures

Arxiv

0+阅读 · 2023年4月24日

The Power of Static Pricing for Reusable Resources

Arxiv

0+阅读 · 2023年4月24日

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Arxiv

0+阅读 · 2023年4月24日

Policy Learning under Biased Sample Selection

Arxiv

0+阅读 · 2023年4月23日

Fair Assortment Planning

Arxiv

0+阅读 · 2023年4月23日

An Index Policy for Minimizing the Uncertainty-of-Information of Markov Sources

Arxiv

0+阅读 · 2023年4月22日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

VIP会员

文章信息

相关主题

dynamic programming

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

87+阅读 · 2021年12月9日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers

Arxiv

0+阅读 · 2023年4月25日

Ensemble Sampling

Arxiv

0+阅读 · 2023年4月25日

On Dynamic Program Decompositions of Static Risk Measures

Arxiv

0+阅读 · 2023年4月24日

The Power of Static Pricing for Reusable Resources

Arxiv

0+阅读 · 2023年4月24日

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Arxiv

0+阅读 · 2023年4月24日

Policy Learning under Biased Sample Selection

Arxiv

0+阅读 · 2023年4月23日

Fair Assortment Planning

Arxiv

0+阅读 · 2023年4月23日

An Index Policy for Minimizing the Uncertainty-of-Information of Markov Sources

Arxiv

0+阅读 · 2023年4月22日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

相关基金

油菜中一个MYB转录因子调控ROS累积与抗逆的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

磷脂酶C-γ2（PLCG2）调节大鼠再生肝的肝细胞凋亡机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

考虑有限理性的供应链中断风险管理模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

STIM1突变与核浆钙信号调控

国家自然科学基金

0+阅读 · 2012年12月31日

冰云辐射性质参数化对东亚夏季风模拟的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化和去乙酰化对MRTF-A抗脑缺血诱导神经细胞凋亡的影响及机制

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

星载高光谱热红外数据的温度与发射率分离算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员