通过时间组合组合对 MDP 的低级别近似值 (A Low-rank Approximation for MDPs via Moment Coupling) - 专知论文

会员服务 ·

0

PDE · 矩 · 近似 · 矩匹配 · 可约的 ·

2021 年 4 月 9 日

A Low-rank Approximation for MDPs via Moment Coupling

翻译：通过时间组合组合对 MDP 的低级别近似值

Amy B. Z. Zhang,Itai Gurvich

We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is $\textit{not}$ required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this $\textit{moment matching}$, the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provided a disciplined mechanism to tune the aggregation and disaggregation probabilities. The computational gains arise from the reduction of the effective state space from $N$ to $N^{\frac{1}{2}+\epsilon}$ is as one might intuitively expect from approximations grounded in the central limit theorem.

翻译：我们引入了一个框架, 以近似于两个支柱的Markov决定进程, 即: 国家汇总 -- -- 作为算法基础设施; 和中央限制理论类型的近似点 -- -- 作为最佳保证的数学基础。理论基于最近Braverman等人( 2020}) 的工作, 该工作将贝尔曼方程式的解决方案与PDE的解决方案联系起来, 本着中心限制理论的精神, 过渡矩阵将缩小到本地的第一和第二时刻。解析 PDE 是我们的方法所需要的 $\ textit{ non} 美元。相反, 我们建造了一个“ 姐妹( 受控的) Markov 链, 其两个本地过渡时刻与焦点链的相近。由于这个 $\ textit{ moment 匹配 $, 原始链及其“ 姐妹” 通过 PDE 的组合, 有利于最佳保证。嵌入标准软组合算法, 此时匹配提供了一种有纪律的机制来调节汇总和分解。计算收益来自于将有效国家空间从 $n$ ($) $\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

0

相关内容

PDE

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

专知会员服务

154+阅读 · 2021年5月9日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Approximation Algorithms for Sparse Principal Component Analysis

Approximation Algorithms for Sparse Principal Component Analysis

Arxiv

0+阅读 · 2021年6月4日

SLOPE for Sparse Linear Regression:Asymptotics and Optimal Regularization

Arxiv

0+阅读 · 2021年6月4日

Maximal Spaces for Approximation Rates in $\ell^1$-regularization

Arxiv

0+阅读 · 2021年6月4日

Provably Strict Generalisation Benefit for Invariance in Kernel Methods

Arxiv

0+阅读 · 2021年6月4日

Approximation Algorithms for Min-Distance Problems in DAGs

Arxiv

0+阅读 · 2021年6月3日

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Arxiv

0+阅读 · 2021年6月3日

Stein's method, smoothing and functional approximation

Arxiv

0+阅读 · 2021年6月3日

Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize

Arxiv

0+阅读 · 2021年6月2日

Convergence and Optimal Complexity of the Adaptive Planewave Method for Eigenvalue Computations

Arxiv

0+阅读 · 2021年6月2日

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

Arxiv

0+阅读 · 2021年6月2日

VIP会员

文章信息

相关主题

相关VIP内容

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

专知会员服务

154+阅读 · 2021年5月9日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Approximation Algorithms for Sparse Principal Component Analysis

Approximation Algorithms for Sparse Principal Component Analysis

Arxiv

0+阅读 · 2021年6月4日

SLOPE for Sparse Linear Regression:Asymptotics and Optimal Regularization

Arxiv

0+阅读 · 2021年6月4日

Maximal Spaces for Approximation Rates in $\ell^1$-regularization

Arxiv

0+阅读 · 2021年6月4日

Provably Strict Generalisation Benefit for Invariance in Kernel Methods

Arxiv

0+阅读 · 2021年6月4日

Approximation Algorithms for Min-Distance Problems in DAGs

Arxiv

0+阅读 · 2021年6月3日

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Arxiv

0+阅读 · 2021年6月3日

Stein's method, smoothing and functional approximation

Arxiv

0+阅读 · 2021年6月3日

Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize

Arxiv

0+阅读 · 2021年6月2日

Convergence and Optimal Complexity of the Adaptive Planewave Method for Eigenvalue Computations

Arxiv

0+阅读 · 2021年6月2日

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

Arxiv

0+阅读 · 2021年6月2日

微信扫码咨询专知VIP会员