Policak-Ruppert 平均Q减序具有统计效率 (Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient) - 专知论文

会员服务 ·

0

统计效率 · 统计量 · 优化器 · 方差 · 样本复杂度 ·

2021 年 12 月 29 日

Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient

翻译：Policak-Ruppert 平均Q减序具有统计效率

Xiang Li,Wenhao Yang,Zhihua Zhang,Michael I. Jordan

We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-leaning) in a $\gamma$-discounted MDP. We establish asymptotic normality for the averaged iteration $\bar{\boldsymbol{Q}}_T$. Furthermore, we show that $\bar{\boldsymbol{Q}}_T$ is actually a regular asymptotically linear (RAL) estimator for the optimal Q-value function $\boldsymbol{Q}^*$ with the most efficient influence function. It implies the averaged Q-learning iteration has the smallest asymptotic variance among all RAL estimators. In addition, we present a non-asymptotic analysis for the $\ell_{\infty}$ error $\mathbb{E}\|\bar{\boldsymbol{Q}}_T-\boldsymbol{Q}^*\|_{\infty}$, showing it matches the instance-dependent lower bound as well as the optimal minimax complexity lower bound. As a byproduct, we find the Bellman noise has sub-Gaussian coordinates with variance $\mathcal{O}((1-\gamma)^{-1})$ instead of the prevailing $\mathcal{O}((1-\gamma)^{-2})$ under the standard bounded reward assumption. The sub-Gaussian result has potential to improve the sample complexity of many RL algorithms. In short, our theoretical analysis shows averaged Q-Leaning is statistically efficient.

翻译：我们用一个 $\ gamma$, 平均 Q- leaning 来研究与 Polyak- Ruppert 平均( a. k. a., 平均 Q- leaning) 同步的 Q- 学习, 以美元计 mDP 。我们为所有 AL 估计值中的平均迭代 $\ bar\ boldsymbol @ t$ 。此外, 我们为 $\\ boldsymbol @ t$ 实际上是一个常规的自动线性( bar\ boldsymol_ T\ boldsymbol_ inty} 用于优化 Q- boldsymball $ 和最有效的影响函数。这意味着平均Q- 学习 Q- 校验的迭代值有最小的负值差异。此外, 我们对 $\ ell_ ftybtyball\ a orma 标准 $. (L_ brown_ lax a pass a ass ass assal dal dal) 平均 roal roal roal roal roal 分析结果。

0

相关内容

统计效率

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

压电力显微成像中机电耦合机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于基尼系数和改进压缩感知的ISAR成像新方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

压缩感知LIDAR三维成像原理与方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于稳健统计的SAR图像配准方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

可实现荧光成像的显微光学断层成像仪器

国家自然科学基金

0+阅读 · 2011年12月31日

重载齿轮箱复杂工况多源激励下复合故障耦合机理及诊断方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于多版本技术的自适应编译优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

Learned Queries for Efficient Local Attention

Arxiv

0+阅读 · 2022年4月19日

Linear codes using simplicial complexes

Arxiv

1+阅读 · 2022年4月18日

Safe rules for the identification of zeros in the solutions of the SLOPE problem

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix Estimation

Arxiv

0+阅读 · 2022年4月16日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Arxiv

0+阅读 · 2022年4月15日

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Arxiv

0+阅读 · 2022年4月15日

On the dimensional indeterminacy of one-wave factor analysis under causal effects

Arxiv

0+阅读 · 2022年4月15日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

Learned Queries for Efficient Local Attention

Arxiv

0+阅读 · 2022年4月19日

Linear codes using simplicial complexes

Arxiv

1+阅读 · 2022年4月18日

Safe rules for the identification of zeros in the solutions of the SLOPE problem

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix Estimation

Arxiv

0+阅读 · 2022年4月16日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Arxiv

0+阅读 · 2022年4月15日

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Outlier-Resistant Estimators for Average Treatment Effect in Causal Inference

Arxiv

0+阅读 · 2022年4月15日

On the dimensional indeterminacy of one-wave factor analysis under causal effects

Arxiv

0+阅读 · 2022年4月15日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

相关基金

压电力显微成像中机电耦合机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于基尼系数和改进压缩感知的ISAR成像新方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

压缩感知LIDAR三维成像原理与方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于稳健统计的SAR图像配准方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

可实现荧光成像的显微光学断层成像仪器

国家自然科学基金

0+阅读 · 2011年12月31日

重载齿轮箱复杂工况多源激励下复合故障耦合机理及诊断方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于多版本技术的自适应编译优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员