利用深网对低层层层外形进行非参数非政策评价的复杂程度 (Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks)

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. We analyze the deep fitted Q-evaluation method for estimating the expected cumulative reward of a target policy, when the data are generated from an unknown behavior policy. We show that, by choosing network size appropriately, one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Specifically, we establish a sharp error bound for fitted Q-evaluation, which depends on the intrinsic dimension of the state-action space, the smoothness of Bellman operator, and a function class-restricted $\chi^2$-divergence. It is noteworthy that the restricted $\chi^2$-divergence measures the behavior and target policies' {\it mismatch in the function space}, which can be small even if the two policies are not close to each other in their tabular forms. We also develop a novel approximation result for convolutional neural networks in Q-function estimation. Numerical experiments are provided to support our theoretical analysis.

翻译：我们考虑利用深层进化神经网络进行强化学习的离政策评估问题。当数据来自未知行为政策时,我们分析用于估计目标政策的预期累积收益的深齐备的Q评价方法;我们表明,通过适当选择网络规模,我们可以在Markov决策过程中利用任何低维的多元结构,在不受到高数据环境维度诅咒的情况下获得一个抽样高效的估测器。具体地说,我们为适合的Q评价设定了一个尖锐的错误,这取决于国家行动空间的内在层面、贝尔曼操作员的顺畅和功能等级限制的2美元差异。值得注意的是,限制的$chi2美元计量能测量功能空间的行为和目标政策之间的不匹配,即使两种政策在表格形式上不相互接近,这也可能是很小的。我们还开发了一个新型的动态近似结果,用于对功能估计中的革命神经网络进行新的近似值。值得注意的是,有限的量性实验是为了支持我们的理论分析。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日