Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis - 专知论文

会员服务 ·

0

方差减小 · TD · 情景 · Markovian · 样本复杂度 ·

2023 年 5 月 22 日

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

翻译：暂无翻译

Shaocong Ma,Yi Zhou,Shaofeng Zou

from arxiv, Accepted for publication in NeurIPS 2020

Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less popular one time-scale TD algorithm or the two time-scale GTD algorithm but with a finite number of i.i.d.\ samples, and both algorithms apply to only the on-policy setting. In this work, we develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting and analyze its non-asymptotic convergence rate over both i.i.d.\ and Markovian samples. In the i.i.d.\ setting, our algorithm {matches the best-known lower bound $\tilde{O}(\epsilon^{-1}$).} In the Markovian setting, our algorithm achieves the state-of-the-art sample complexity $O(\epsilon^{-1} \log {\epsilon}^{-1})$ that is near-optimal. Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller asymptotic convergence error than both the conventional TDC and the variance-reduced TD.

翻译：暂无翻译

0

相关内容

方差减小

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

肠道靶向性骨髓间充质干细胞通过重建肠道微生态来治疗实验性IBD的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

严酷海洋大气环境中冷轧板在非稳态薄液膜下的腐蚀行为与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

湿热耦合机制下混凝土早龄期收缩变形多尺度预测模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

井中雷达储层监测机理及方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

POMC神经元中雌激素受体α-PI3K信号通路调节能量及葡萄糖平衡

国家自然科学基金

0+阅读 · 2012年12月31日

刺人参抗大肠癌体内外物质基础和Wnt/β catenin 通路介导的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Annexin A5靶向造影剂介导的超声分子成像评价乳腺癌凋亡的新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于硅基一维纳米线的Gate-all-around纳米晶体管的研究

国家自然科学基金

0+阅读 · 2011年12月31日

炎症反应TLR4信号传导途径及IL-23/IL-17轴基因多态及细胞因子与大肠癌的关系

国家自然科学基金

0+阅读 · 2009年12月31日

金属配合物官能化LDH纳米层片多功能催化材料的制备及性能

国家自然科学基金

0+阅读 · 2009年12月31日

Robust estimation for ergodic Markovian processes

Arxiv

0+阅读 · 2023年7月7日

Multiclass Online Learning and Uniform Convergence

Arxiv

0+阅读 · 2023年7月7日

Valid Heteroskedasticity Robust Testing

Arxiv

0+阅读 · 2023年7月7日

Approximation Algorithms for Directed Weighted Spanners

Arxiv

0+阅读 · 2023年7月7日

Dynamic Factor Analysis with Dependent Gaussian Processes for High-Dimensional Gene Expression Trajectories

Arxiv

0+阅读 · 2023年7月6日

D-optimal Subsampling Design for Massive Data Linear Regression

Arxiv

0+阅读 · 2023年7月5日

Active Cost-aware Labeling of Streaming Data

Arxiv

0+阅读 · 2023年7月4日

A Non-Classical Parameterization for Density Estimation Using Sample Moments

Arxiv

0+阅读 · 2023年7月4日

Bayes optimal learning in high-dimensional linear regression with network side information

Arxiv

0+阅读 · 2023年7月4日

A Double Machine Learning Approach to Combining Experimental and Observational Data

Arxiv

0+阅读 · 2023年7月4日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Robust estimation for ergodic Markovian processes

Arxiv

0+阅读 · 2023年7月7日

Multiclass Online Learning and Uniform Convergence

Arxiv

0+阅读 · 2023年7月7日

Valid Heteroskedasticity Robust Testing

Arxiv

0+阅读 · 2023年7月7日

Approximation Algorithms for Directed Weighted Spanners

Arxiv

0+阅读 · 2023年7月7日

Dynamic Factor Analysis with Dependent Gaussian Processes for High-Dimensional Gene Expression Trajectories

Arxiv

0+阅读 · 2023年7月6日

D-optimal Subsampling Design for Massive Data Linear Regression

Arxiv

0+阅读 · 2023年7月5日

Active Cost-aware Labeling of Streaming Data

Arxiv

0+阅读 · 2023年7月4日

A Non-Classical Parameterization for Density Estimation Using Sample Moments

Arxiv

0+阅读 · 2023年7月4日

Bayes optimal learning in high-dimensional linear regression with network side information

Arxiv

0+阅读 · 2023年7月4日

A Double Machine Learning Approach to Combining Experimental and Observational Data

Arxiv

0+阅读 · 2023年7月4日

相关基金

肠道靶向性骨髓间充质干细胞通过重建肠道微生态来治疗实验性IBD的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

严酷海洋大气环境中冷轧板在非稳态薄液膜下的腐蚀行为与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

湿热耦合机制下混凝土早龄期收缩变形多尺度预测模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

井中雷达储层监测机理及方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

POMC神经元中雌激素受体α-PI3K信号通路调节能量及葡萄糖平衡

国家自然科学基金

0+阅读 · 2012年12月31日

刺人参抗大肠癌体内外物质基础和Wnt/β catenin 通路介导的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Annexin A5靶向造影剂介导的超声分子成像评价乳腺癌凋亡的新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于硅基一维纳米线的Gate-all-around纳米晶体管的研究

国家自然科学基金

0+阅读 · 2011年12月31日

炎症反应TLR4信号传导途径及IL-23/IL-17轴基因多态及细胞因子与大肠癌的关系

国家自然科学基金

0+阅读 · 2009年12月31日

金属配合物官能化LDH纳米层片多功能催化材料的制备及性能

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员