解决分布式离线强化学习通信复杂性 (Settling the Communication Complexity for Distributed Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

Bandits · INFORMS · 估计/估计量 · 学成 · 优化器 ·

2022 年 2 月 10 日

Settling the Communication Complexity for Distributed Offline Reinforcement Learning

翻译：解决分布式离线强化学习通信复杂性

Juliusz Krysztof Ziomek,Jun Wang,Yaodong Yang

We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem but only one single round of communication is allowed and there is a budget constraint on the total number of information (in terms of bits) that each machine can send out. For value function prediction in contextual bandits, and both episodic and non-episodic MDPs, we establish information-theoretic lower bounds on the minimax risk for distributed statistical estimators; this reveals the minimum amount of communication required by any offline RL algorithms. Specifically, for contextual bandits, we show that the number of bits must scale at least as $\Omega(AC)$ to match the centralised minimax optimal rate, where $A$ is the number of actions and $C$ is the context dimension; meanwhile, we reach similar results in the MDP settings. Furthermore, we develop learning algorithms based on least-squares estimates and Monte-Carlo return estimates and provide a sharp analysis showing that they can achieve optimal risk up to logarithmic factors. Additionally, we also show that temporal difference is unable to efficiently utilise information from all available devices under the single-round communication setting due to the initial bias of this method. To our best knowledge, this paper presents the first minimax lower bounds for distributed offline RL problems.

翻译：我们研究的是离线强化学习(RL)的新环境,在离线强化学习(RL)中,一些分布式机器共同合作解决问题,但只允许单轮通信,而且每个机器能够发送的信息总量(按位数计算)在预算上受到限制。对于背景土匪的价值函数预测,以及分数和非分数的 MDP,我们为分布式统计估计器在微缩最大风险上设定了信息理论下限;这显示了任何离线RL算法所需的最低通信量。具体地说,对于背景土匪,我们显示比特数的数量必须至少相当于$\Omega(AC),以与集中式微缩最大最佳速率匹配,其中$A是行动的数量,$是上下文层面;与此同时,我们在MDP设置中也取得了类似的结果。此外,我们开发了基于最小度估计值的学习算法和蒙特-卡洛返回估计值,并提供敏锐的分析,表明它们能够达到最优化的风险,直到对调因素。此外,我们展示了至少达到对调因素的最佳风险的比值比例,我们从最初的最小分析方法上展示了目前的最佳偏差的方法。

0

相关内容

Bandits

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

移动云计算复杂网络环境下任务粒度的应用划分和调度方法

国家自然科学基金

0+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

云计算环境下移动Agent系统信任安全关键技术研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向超多目标优化的分解进化算法

国家自然科学基金

0+阅读 · 2014年12月31日

复杂海况下的海事无人艇路径规划研究

国家自然科学基金

2+阅读 · 2013年12月31日

复杂动态场景中鲁棒的视觉跟踪算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

多模式杆式移动机构

国家自然科学基金

1+阅读 · 2011年12月31日

自主车辆的高质量三维场景认知与导航避障控制方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

分布式数据流的集成模式挖掘模型和概念漂移检测算法研究

国家自然科学基金

2+阅读 · 2008年12月31日

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

《战争形态演变：合成兵种防御主导模式探析》48页slides

人工智能驱动弹药制造现代化：美国陆军转型之路

《多域空战指挥体系：驾驭复杂性的艺术》

构建军事人工智能信任体系始于破除黑盒机制

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

相关基金

移动云计算复杂网络环境下任务粒度的应用划分和调度方法

国家自然科学基金

0+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

云计算环境下移动Agent系统信任安全关键技术研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向超多目标优化的分解进化算法

国家自然科学基金

0+阅读 · 2014年12月31日

复杂海况下的海事无人艇路径规划研究

国家自然科学基金

2+阅读 · 2013年12月31日

复杂动态场景中鲁棒的视觉跟踪算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

多模式杆式移动机构

国家自然科学基金

1+阅读 · 2011年12月31日

自主车辆的高质量三维场景认知与导航避障控制方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

分布式数据流的集成模式挖掘模型和概念漂移检测算法研究

国家自然科学基金

2+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员