具有一般平滑函数近似性的两个时间尺度TDC的非症状分析 (Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation) - 专知论文

会员服务 ·

0

泛函 · 近似 · 平滑 · 切平面 · 策略评估 ·

2021 年 9 月 30 日

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

翻译：具有一般平滑函数近似性的两个时间尺度TDC的非症状分析

Yue Wang,Shaofeng Zou,Yi Zhou

from arxiv, Accepted by NeurIPS 2021

Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to the one with general smooth function approximation. The asymptotic convergence for the on-policy setting with general smooth function approximation was established in [bhatnagar2009convergent], however, the finite-sample analysis remains unsolved due to challenges in the non-linear and two-time-scale update structure, non-convex objective function and the time-varying projection onto a tangent plane. In this paper, we develop novel techniques to explicitly characterize the finite-sample error bound for the general off-policy setting with i.i.d.\ or Markovian samples, and show that it converges as fast as $\mathcal O(1/\sqrt T)$ (up to a factor of $\mathcal O(\log T)$). Our approach can be applied to a wide range of value-based reinforcement learning algorithms with general smooth function approximation.

翻译：使用梯度校正(TDC) 的时差学习是一种用于强化学习中政策评价的两种时间尺度算法。这种算法最初用线性函数近似法提出,后来扩大到一般平滑函数近近似法。在[bhatnagar2009convergent] 中,以一般平滑函数近近似法确定政策环境中的无光度趋同,但由于非线性和两次时间级更新结构、非康维克斯目标函数和对正切平面的时间对流投的挑战,有限抽样分析仍未解析。在本文中,我们开发了新技术,以 i.d.\ 或Markovian 样本明确确定一般离层设置的定点缩差,并显示它与 $mathcal O(1/\ sqrt T) 相融合的速度( 最高为 $\mathcal O(\log T) 的系数) 。我们的方法可以适用于一系列基于价值的强化学习算法, 以及一般平稳功能近似。

0

相关内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

NeurIPS 2020接收论文列表发布，1900篇论文都在这了！

专知会员服务

114+阅读 · 2020年10月8日

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

专知会员服务

66+阅读 · 2020年4月17日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

An Introduction to Variational Inference

Arxiv

0+阅读 · 2021年11月22日

Deep Probability Estimation

Arxiv

0+阅读 · 2021年11月21日

Faster Deterministic Approximation Algorithms for Correlation Clustering and Cluster Deletion

Arxiv

0+阅读 · 2021年11月20日

A Constant-Factor Approximation for Generalized Malleable Scheduling under $M^\natural$-Concave Processing Speeds

Arxiv

0+阅读 · 2021年11月19日

Posterior concentration and fast convergence rates for generalized Bayesian learning

Posterior concentration and fast convergence rates for generalized Bayesian learning

Arxiv

0+阅读 · 2021年11月19日

Bounds in $L^1$ Wasserstein distance on the normal approximation of general M-estimators

Arxiv

0+阅读 · 2021年11月18日

Efficient and Generalizable Tuning Strategies for Stochastic Gradient MCMC

Arxiv

0+阅读 · 2021年11月18日

Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems

Arxiv

0+阅读 · 2021年11月17日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Arxiv

4+阅读 · 2019年3月7日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

VIP会员

文章信息

相关主题

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

NeurIPS 2020接收论文列表发布，1900篇论文都在这了！

专知会员服务

114+阅读 · 2020年10月8日

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

专知会员服务

66+阅读 · 2020年4月17日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

An Introduction to Variational Inference

Arxiv

0+阅读 · 2021年11月22日

Deep Probability Estimation

Arxiv

0+阅读 · 2021年11月21日

Faster Deterministic Approximation Algorithms for Correlation Clustering and Cluster Deletion

Arxiv

0+阅读 · 2021年11月20日

A Constant-Factor Approximation for Generalized Malleable Scheduling under $M^\natural$-Concave Processing Speeds

Arxiv

0+阅读 · 2021年11月19日

Posterior concentration and fast convergence rates for generalized Bayesian learning

Posterior concentration and fast convergence rates for generalized Bayesian learning

Arxiv

0+阅读 · 2021年11月19日

Bounds in $L^1$ Wasserstein distance on the normal approximation of general M-estimators

Arxiv

0+阅读 · 2021年11月18日

Efficient and Generalizable Tuning Strategies for Stochastic Gradient MCMC

Arxiv

0+阅读 · 2021年11月18日

Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems

Arxiv

0+阅读 · 2021年11月17日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Arxiv

4+阅读 · 2019年3月7日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

微信扫码咨询专知VIP会员