We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is $\textit{not}$ required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this $\textit{moment matching}$, the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provided a disciplined mechanism to tune the aggregation and disaggregation probabilities. The computational gains arise from the reduction of the effective state space from $N$ to $N^{\frac{1}{2}+\epsilon}$ is as one might intuitively expect from approximations grounded in the central limit theorem.
翻译:我们引入了一个框架, 以近似于两个支柱的Markov决定进程, 即: 国家汇总 -- -- 作为算法基础设施; 和中央限制理论类型的近似点 -- -- 作为最佳保证的数学基础。 理论基于最近Braverman等人( 2020}) 的工作, 该工作将贝尔曼方程式的解决方案与PDE的解决方案联系起来, 本着中心限制理论的精神, 过渡矩阵将缩小到本地的第一和第二时刻。 解析 PDE 是我们的方法所需要的 $\ textit{ non} 美元。 相反, 我们建造了一个“ 姐妹( 受控的) Markov 链, 其两个本地过渡时刻与焦点链的相近。 由于这个 $\ textit{ moment 匹配 $, 原始链及其“ 姐妹” 通过 PDE 的组合, 有利于最佳保证。 嵌入标准软组合算法, 此时匹配提供了一种有纪律的机制来调节汇总和分解 。 计算收益来自于将有效国家空间从 $n$ ($) $\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\