Doubly- 同步值迭代: 使数值迭代在行动中不同步 (Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions) - 专知论文

会员服务 ·

0

值迭代 · 视觉识别系统 · Learning · 状态空间 · dynamic programming ·

2022 年 7 月 4 日

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

翻译：Doubly- 同步值迭代: 使数值迭代在行动中不同步

Tian Tian,Kenny Young,Richard S. Sutton

Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many applications. Asynchronous VI helps to address the large state space problem by updating one state at a time, in-place and in an arbitrary order. However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space. To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions. More concretely, DAVI maximizes over a sampled subset of actions that can be of any user-defined size. This simple approach of using sampling to reduce computation maintains similarly appealing theoretical properties to VI without the need to wait for a full sweep through the entire action space in each update. In this paper, we show DAVI converges to the optimal value function with probability one, converges at a near-geometric rate with probability 1-delta, and returns a near-optimal policy in computation time that nearly matches a previously established bound for VI. We also empirically demonstrate DAVI's effectiveness in several experiments.

翻译：值迭代( VI) 是一种基础动态的动态编程方法, 在最佳控制和强化学习中对于学习和规划非常重要。 VI 分批进行, 必须在下一批更新之前完成对每个州价值的更新。如果国家空间很大, 完成一个批次的费用太高, 使VI 在许多应用中不切实际。亚同步 VI 有助于解决巨大的州空间问题, 方法是在时间、地点和任意顺序上一次更新一个国家。然而, 同步 VI 仍然需要在整个行动空间上实现最大化, 这使得它对于有较大行动空间的域不切实际。要解决这个问题, 我们建议双向不同步的值迭代谢( DAVI ), 这是一种新的算法, 将州到州和州之间的非同步性理念概括到许多应用。更具体地说, DAVI 在抽样的一组行动上最大化, 可以使用任何用户定义大小。但是, 使用这种简单的抽样来减少计算方法将同样的理论属性维持到VI VI, 不需要等待每批次更新的整个行动空间的完整扫描, 在接近 VVI 的轨中, 。在每批次更新中, 我们显示一个最佳的概率返回。

0

相关内容

值迭代

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

3.5 µm中红外超短脉冲掺铒ZBLAN光纤激光器关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

遗忘型轻度认知障碍患者内颞叶记忆网络动态变化研究

国家自然科学基金

0+阅读 · 2015年12月31日

窄带可调谐中红外光学参量振荡器

国家自然科学基金

0+阅读 · 2012年12月31日

高计数率MRPC工作稳定性研究

国家自然科学基金

0+阅读 · 2012年12月31日

1.94 um波段Tm:Ho共掺石英基全光纤飞秒脉冲激光技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

高梯度表面微带真空绝缘体闪络机理与制备技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

CdTe/PbTe异质结二维电子气的电学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于受激布里渊散射的长寿命宽带光存储器的研究

国家自然科学基金

0+阅读 · 2009年12月31日

DCSF: Deep Convolutional Set Functions for Classification of Asynchronous Time Series

Arxiv

0+阅读 · 2022年8月24日

Inferring Topology of Networked Dynamical Systems by Active Excitations

Arxiv

0+阅读 · 2022年8月24日

Asynchronous Execution of Heterogeneous Tasks in AI-coupled HPC Workflows

Arxiv

0+阅读 · 2022年8月23日

Derandomizing Directed Random Walks in Almost-Linear Time

Arxiv

0+阅读 · 2022年8月23日

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Arxiv

0+阅读 · 2022年8月22日

Filtering In Neural Implicit Functions

Arxiv

0+阅读 · 2022年8月22日

Event-Based Beam Tracking with Dynamic Beamwidth Adaptation in Terahertz (THz) Communications

Arxiv

0+阅读 · 2022年8月21日

Automatic Meta-Path Discovery for Effective Graph-Based Recommendation

Arxiv

0+阅读 · 2022年8月21日

Robust Tests in Online Decision-Making

Arxiv

0+阅读 · 2022年8月21日

On the generalization of learning algorithms that do not converge

Arxiv

0+阅读 · 2022年8月19日

VIP会员

文章信息

相关主题

视觉识别系统

dynamic programming

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

DCSF: Deep Convolutional Set Functions for Classification of Asynchronous Time Series

Arxiv

0+阅读 · 2022年8月24日

Inferring Topology of Networked Dynamical Systems by Active Excitations

Arxiv

0+阅读 · 2022年8月24日

Asynchronous Execution of Heterogeneous Tasks in AI-coupled HPC Workflows

Arxiv

0+阅读 · 2022年8月23日

Derandomizing Directed Random Walks in Almost-Linear Time

Arxiv

0+阅读 · 2022年8月23日

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Arxiv

0+阅读 · 2022年8月22日

Filtering In Neural Implicit Functions

Arxiv

0+阅读 · 2022年8月22日

Event-Based Beam Tracking with Dynamic Beamwidth Adaptation in Terahertz (THz) Communications

Arxiv

0+阅读 · 2022年8月21日

Automatic Meta-Path Discovery for Effective Graph-Based Recommendation

Arxiv

0+阅读 · 2022年8月21日

Robust Tests in Online Decision-Making

Arxiv

0+阅读 · 2022年8月21日

On the generalization of learning algorithms that do not converge

Arxiv

0+阅读 · 2022年8月19日

相关基金

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

3.5 µm中红外超短脉冲掺铒ZBLAN光纤激光器关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

遗忘型轻度认知障碍患者内颞叶记忆网络动态变化研究

国家自然科学基金

0+阅读 · 2015年12月31日

窄带可调谐中红外光学参量振荡器

国家自然科学基金

0+阅读 · 2012年12月31日

高计数率MRPC工作稳定性研究

国家自然科学基金

0+阅读 · 2012年12月31日

1.94 um波段Tm:Ho共掺石英基全光纤飞秒脉冲激光技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

高梯度表面微带真空绝缘体闪络机理与制备技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

CdTe/PbTe异质结二维电子气的电学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于受激布里渊散射的长寿命宽带光存储器的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员