在衡量空间进行适应性适应性强化学习 (Provably adaptive reinforcement learning in metric spaces) - 专知论文

会员服务 ·

0

强化学习 · 学成 · Continuity · 赌博机/老虎机 · 优化器 ·

2021 年 10 月 20 日

Provably adaptive reinforcement learning in metric spaces

翻译：在衡量空间进行适应性适应性强化学习

Tongyi Cao,Akshay Krishnamurthy

from arxiv, Published in NeurIPS 2020. This version fixes a bug in the published version

We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.

翻译：我们研究持续状态和行动空间的强化学习。我们对辛克莱、贝纳杰和尤(2019年)的算法变体进行精细分析,并显示其带有此实例的 emph{zooming 维度的遗憾度。这个参数源于土匪文献,它捕捉了近乎最佳行动的子集的大小,并且总是小于先前分析中所使用的覆盖维度。因此,我们的结果是第一个在衡量空间加强学习的可证实的适应性保障。

0

相关内容

强化学习

强化学习（RL）是机器学习的一个领域，与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外，强化学习是三种基本的机器学习范式之一。强化学习与监督学习的不同之处在于，不需要呈现带标签的输入/输出对，也不需要显式纠正次优动作。相反，重点是在探索（未知领域）和利用（当前知识）之间找到平衡。该环境通常以马尔可夫决策过程（MDP）的形式陈述，因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于，后者不假设MDP的确切数学模型，并且针对无法采用精确方法的大型MDP。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Arxiv

0+阅读 · 2021年12月17日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Hardness-Aware Deep Metric Learning

Hardness-Aware Deep Metric Learning

Arxiv

6+阅读 · 2019年3月13日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACML2025教程】迈向鲁棒且可信的大语言模型：问题与缓解策略

《利用人工智能改善军事警察行动：当下现状探索》最新95页报告

Google《AI智能体企业应用手册报告》，46页pdf

面向现代武装力量的高级AI驱动军事模拟与训练软件

相关资讯

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

相关论文

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Arxiv

0+阅读 · 2021年12月17日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Hardness-Aware Deep Metric Learning

Hardness-Aware Deep Metric Learning

Arxiv

6+阅读 · 2019年3月13日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

微信扫码咨询专知VIP会员