与两位专家一起感到最乐观的时时后悔 (Optimal anytime regret with two experts) - 专知论文

会员服务 ·

0

优化器 · CASE · 情景 · Continuity · Less ·

2021 年 8 月 26 日

Optimal anytime regret with two experts

翻译：与两位专家一起感到最乐观的时时后悔

Nicholas J. A. Harvey,Christopher Liaw,Edwin Perkins,Sikander Randhawa

from arxiv, 47 pages, 1 figure

We consider the classical problem of prediction with expert advice. In the fixed-time setting, where the time horizon is known in advance, algorithms that achieve the optimal regret are known when there are two, three, or four experts or when the number of experts is large. Much less is known about the problem in the anytime setting, where the time horizon is not known in advance. No minimax optimal algorithm was previously known in the anytime setting, regardless of the number of experts. Even for the case of two experts, Luo and Schapire have left open the problem of determining the optimal algorithm. We design the first minimax optimal algorithm for minimizing regret in the anytime setting. We consider the case of two experts, and prove that the optimal regret is $\gamma \sqrt{t} / 2$ at all time steps $t$, where $\gamma$ is a natural constant that arose 35 years ago in studying fundamental properties of Brownian motion. The algorithm is designed by considering a continuous analogue of the regret problem, which is solved using ideas from stochastic calculus.

翻译：我们从专家咨询的角度来考虑典型的预测问题。在固定时间的环境下,时间范围是预先知道的,当有2、3或4名专家或专家人数众多时,就会知道实现最佳遗憾的算法。在时间跨度不为人知、时间跨度不为人知的时段里,对问题知之甚少。无论专家人数多多,在时间跨度上,以前从未知道任何微型最大最佳算法。即使有两位专家,Luo和Schapire,也留下了确定最佳算法的问题。我们设计了第一个在时间跨度上最大限度地减少遗憾的微小算法。我们考虑了2名专家的情况,并证明最好的遗憾是$\gamma\ sqrt{t} / 2$tt, 美元是35年前在研究布朗运动的基本特性时产生的自然常数。算法的设计是考虑一个连续的遗憾问题的类比,这个问题是利用从微量的计算中解决的。

0

相关内容

优化器

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

On the Global Convergence of Momentum-based Policy Gradient

Arxiv

0+阅读 · 2021年10月19日

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月19日

Approximate Sampling and Counting of Graphs with Near-Regular Degree Intervals

Arxiv

0+阅读 · 2021年10月18日

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

Arxiv

0+阅读 · 2021年10月18日

Improving reinforcement learning algorithms: towards optimal learning rate policies

Arxiv

0+阅读 · 2021年10月17日

Complexity of optimizing over the integers

Arxiv

0+阅读 · 2021年10月15日

Reward-Weighted Regression Converges to a Global Optimum

Reward-Weighted Regression Converges to a Global Optimum

Arxiv

0+阅读 · 2021年10月15日

Gaussian Process Bandit Optimization with Few Batches

Arxiv

0+阅读 · 2021年10月15日

Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver

Arxiv

0+阅读 · 2021年10月8日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

On the Global Convergence of Momentum-based Policy Gradient

Arxiv

0+阅读 · 2021年10月19日

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月19日

Approximate Sampling and Counting of Graphs with Near-Regular Degree Intervals

Arxiv

0+阅读 · 2021年10月18日

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

Arxiv

0+阅读 · 2021年10月18日

Improving reinforcement learning algorithms: towards optimal learning rate policies

Arxiv

0+阅读 · 2021年10月17日

Complexity of optimizing over the integers

Arxiv

0+阅读 · 2021年10月15日

Reward-Weighted Regression Converges to a Global Optimum

Reward-Weighted Regression Converges to a Global Optimum

Arxiv

0+阅读 · 2021年10月15日

Gaussian Process Bandit Optimization with Few Batches

Arxiv

0+阅读 · 2021年10月15日

Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver

Arxiv

0+阅读 · 2021年10月8日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员