价值-功率 (Low-rank State-action Value-function Approximation) - 专知论文

会员服务 ·

0

估计/估计量 · 近似 · 可约的 · 维数灾难 · 样本复杂度 ·

2021 年 4 月 18 日

Low-rank State-action Value-function Approximation

翻译：价值-功率

Sergio Rozada,Victor Tenorio,Antonio G. Marques

Value functions are central to Dynamic Programming and Reinforcement Learning but their exact estimation suffers from the curse of dimensionality, challenging the development of practical value-function (VF) estimation algorithms. Several approaches have been proposed to overcome this issue, from non-parametric schemes that aggregate states or actions to parametric approximations of state and action VFs via, e.g., linear estimators or deep neural networks. Relevantly, several high-dimensional state problems can be well-approximated by an intrinsic low-rank structure. Motivated by this and leveraging results from low-rank optimization, this paper proposes different stochastic algorithms to estimate a low-rank factorization of the $Q(s, a)$ matrix. This is a non-parametric alternative to VF approximation that dramatically reduces the computational and sample complexities relative to classical $Q$-learning methods that estimate $Q(s,a)$ separately for each state-action pair.

翻译：价值功能是动态规划和强化学习的核心,但其准确估计却受到维度的诅咒,对实际价值功能估算算法的发展提出了挑战。为解决这一问题,提出了若干办法,从非参数性计划(通过线性估计器或深神经网络等方式,综合国家或行动状态和行动的参数近似值)到非参数性计划(通过线性估测器或深神经网络),从非参数性计划(通过线性估计器或深层神经网络)到参数性指标性估算。相关的是,若干高度状态问题可能与内在的低层次结构十分接近。受此因素和低层次优化的杠杆效应影响,本文提出了不同的随机算法,以估计美元(a)矩阵的低等级系数。这是与VF接近性估算值(a)的不完全参数性替代法(VF),大大降低了计算和抽样的复杂性,而典型的美元学习方法则分别估算每对州-行动对方的美元(s,a)值。

0

相关内容

估计/估计量

估计/估计量

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

专知会员服务

10+阅读 · 2020年5月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

专知会员服务

88+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation

Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation

Arxiv

0+阅读 · 2021年6月10日

Strong Gaussian Approximation for the Sum of Random Vectors

Arxiv

0+阅读 · 2021年6月10日

Approximation Algorithms For The Dispersion Problems in a Metric Space

Arxiv

0+阅读 · 2021年6月9日

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

Arxiv

0+阅读 · 2021年6月8日

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Arxiv

0+阅读 · 2021年6月8日

An improved approximation algorithm for ATSP

Arxiv

0+阅读 · 2021年6月7日

Low-rank Characteristic Tensor Density Estimation Part I: Foundations

Arxiv

0+阅读 · 2021年6月5日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Arxiv

4+阅读 · 2019年3月7日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

VIP会员

文章信息

相关主题

估计/估计量

样本复杂度

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

【SIGIR2020】高效查询自动补全，Efficient and Effective Query Auto-Completion

专知会员服务

10+阅读 · 2020年5月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

专知会员服务

88+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation

Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation

Arxiv

0+阅读 · 2021年6月10日

Strong Gaussian Approximation for the Sum of Random Vectors

Arxiv

0+阅读 · 2021年6月10日

Approximation Algorithms For The Dispersion Problems in a Metric Space

Arxiv

0+阅读 · 2021年6月9日

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

Arxiv

0+阅读 · 2021年6月8日

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Arxiv

0+阅读 · 2021年6月8日

An improved approximation algorithm for ATSP

Arxiv

0+阅读 · 2021年6月7日

Low-rank Characteristic Tensor Density Estimation Part I: Foundations

Arxiv

0+阅读 · 2021年6月5日

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Arxiv

4+阅读 · 2019年3月7日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

微信扫码咨询专知VIP会员