合作半银行的高效算法 (An Efficient Algorithm for Cooperative Semi-Bandits) - 专知论文

会员服务 ·

0

CC · Networking · 优化器 · 可交换的 · 情景 ·

2021 年 2 月 9 日

An Efficient Algorithm for Cooperative Semi-Bandits

翻译：合作半银行的高效算法

Riccardo Della Vecchia,Tommaso Cesari

We consider the problem of asynchronous online combinatorial optimization on a network of communicating agents. At each time step, some of the agents are stochastically activated, requested to make a prediction, and the system pays the corresponding loss. Then, neighbors of active agents receive semi-bandit feedback and exchange some succinct local information. The goal is to minimize the network regret, defined as the difference between the cumulative loss of the predictions of active agents and that of the best action in hindsight, selected from a combinatorial decision set. The main challenge in such a context is to control the computational complexity of the resulting algorithm while retaining minimax optimal regret guarantees. We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm, that implements a new loss estimation procedure generalizing the Geometric Resampling of Neu and Bart{\'o}k [2013] to our setting. Assuming that the elements of the decision set are k-dimensional binary vectors with at most m non-zero entries and $\alpha$ 1 is the independence number of the network, we show that the expected regret of our algorithm after T time steps is of order Q mkT log(k)(k$\alpha$ 1 /Q + m), where Q is the total activation probability mass. Furthermore, we prove that this is only $\sqrt$ k log k-away from the best achievable rate and that Coop-FTPL has a state-of-the-art T 3/2 worst-case computational complexity.

翻译：我们考虑的是通信代理商网络上的不同步在线组合优化问题。在每一个步骤中, 某些代理商都会被快速启动, 被要求做出预测, 系统会支付相应的损失。然后, 活跃代理商的邻居会收到半弯曲反馈, 并交换一些简洁的本地信息。目标是将网络的遗憾降到最低程度, 即活动代理商预测的累积损失与从组合式决定集中选择的后视中的最佳动作之间的差别。在这样的背景下, 主要的挑战是控制由此产生的算法的计算复杂性, 同时保留微量计算法的最佳遗憾保证。我们引入Coop- FTPL, 这是众所周知的“ 跟踪隐蔽的领头算法” 合作版本, 实施新的损失估算程序, 将Eeu和Bart_ofrk [2013] 的几何标准推广到我们的设置范围。假设决定集的元素是k- 维度的二进量矢矢量, 最差的条目和 $\alpha$ 1 是网络的独立 Q 。我们所预期的轨道 ral_ ral_ ral_ ral_ ral_ ral_ ral

0

相关内容

CC在计算复杂性方面表现突出。它的学科处于数学与计算机理论科学的交叉点，具有清晰的数学轮廓和严格的数学格式。官网链接：https://link.springer.com/journal/37

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

155+阅读 · 2020年8月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

183+阅读 · 2020年2月1日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

177+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Cursed yet Satisfied Agents

Arxiv

0+阅读 · 2021年4月2日

An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces

Arxiv

0+阅读 · 2021年4月1日

Pareto optimal exchange with indifferent endowments

Arxiv

0+阅读 · 2021年4月1日

Delay-Tolerant Consensus-based Distributed Estimation: Full-Rank Systems with Potentially Unstable Dynamics

Arxiv

0+阅读 · 2021年4月1日

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Arxiv

0+阅读 · 2021年4月1日

Learning to be safe, in finite time

Arxiv

0+阅读 · 2021年3月31日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Cellular-Connected UAVs over 5G: Deep Reinforcement Learning for Interference Management

Arxiv

4+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

155+阅读 · 2020年8月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

183+阅读 · 2020年2月1日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

177+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能与天空地一体化网络的相互作用研究综述》61页长综述

《通过地理空间情报管理的战损评估以加速战场决策》31页报告

美国“核指挥、控制和通信（NC3）”最新情况

《俄乌战争：击败陆地部队仍需陆地部队》最新报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Cursed yet Satisfied Agents

Arxiv

0+阅读 · 2021年4月2日

An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces

Arxiv

0+阅读 · 2021年4月1日

Pareto optimal exchange with indifferent endowments

Arxiv

0+阅读 · 2021年4月1日

Delay-Tolerant Consensus-based Distributed Estimation: Full-Rank Systems with Potentially Unstable Dynamics

Arxiv

0+阅读 · 2021年4月1日

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Arxiv

0+阅读 · 2021年4月1日

Learning to be safe, in finite time

Arxiv

0+阅读 · 2021年3月31日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Cellular-Connected UAVs over 5G: Deep Reinforcement Learning for Interference Management

Arxiv

4+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员