上反事实信任区:上反事实信任区:上反事实信任区的新乐观原则 (Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · Principle · 置信度 · 优化器 ·

2021 年 2 月 12 日

Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits

翻译：上反事实信任区:上反事实信任区:上反事实信任区的新乐观原则

Yunbei Xu,Assaf Zeevi

The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning. However, existing optimistic algorithms (primarily UCB and its variants) are often unable to deal with large context spaces. Essentially all existing well performing algorithms for general contextual bandit problems rely on weighted action allocation schemes; and theoretical guarantees for optimism-based algorithms are only known for restricted formulations. In this paper we study general contextual bandits under the realizability condition, and propose a simple generic principle to design optimistic algorithms, dubbed "Upper Counterfactual Confidence Bounds" (UCCB). We show that these algorithms are provably optimal and efficient in the presence of large context spaces. Key components of UCCB include: 1) a systematic analysis of confidence bounds in policy space rather than in action space; and 2) the potential function perspective that is used to express the power of optimism in the contextual setting. We further show how the UCCB principle can be extended to infinite action spaces, by constructing confidence bounds via the newly introduced notion of "counterfactual action divergence."

翻译：面对不确定性的乐观原则是多武装强盗和强化学习中最广泛使用和最成功的想法之一,然而,现有的乐观算法(主要是UCB及其变异物)往往无法处理大的背景空间。一般背景强盗问题的所有现有运作良好的算法基本上都依赖于加权行动分配办法;乐观算法的理论保障只为有限的配方所知道。在本文中,我们根据真实性条件研究一般背景强盗,并提出一个简单的通用原则来设计乐观的算法,称为“顶级反事实信任犬”(UCCB)。我们表明,这些算法在大的背景空间存在时,是可实现最佳和高效的。UCB的关键组成部分包括:1)系统分析政策空间而不是行动空间的信任界限;2)用于表达背景环境中乐观力的潜在功能观点。我们进一步表明,如何通过新引入的“相对行动差异”概念,将信任界限扩大到无限的行动空间。

1

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

专知会员服务

279+阅读 · 2020年7月2日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

5+阅读 · 2018年11月15日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Leveraging Good Representations in Linear Contextual Bandits

Arxiv

0+阅读 · 2021年4月8日

New Bounds For Distributed Mean Estimation and Variance Reduction

Arxiv

0+阅读 · 2021年4月7日

Accelerated derivative-free nonlinear least-squares applied to the estimation of Manning coefficients

Arxiv

0+阅读 · 2021年4月6日

Lower Bounds Implementing Mediators in Asynchronous Systems

Arxiv

0+阅读 · 2021年4月6日

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

Arxiv

0+阅读 · 2021年4月6日

Context-Dependent Upper-Confidence Bounds for Directed Exploration

Arxiv

0+阅读 · 2021年4月6日

On the Optimality of Batch Policy Optimization Algorithms

Arxiv

0+阅读 · 2021年4月6日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

【干货书-斯坦福】最优化算法，521页pdf，《Algorithms for Optimization》MIT出版社

专知会员服务

279+阅读 · 2020年7月2日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

5+阅读 · 2018年11月15日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Leveraging Good Representations in Linear Contextual Bandits

Arxiv

0+阅读 · 2021年4月8日

New Bounds For Distributed Mean Estimation and Variance Reduction

Arxiv

0+阅读 · 2021年4月7日

Accelerated derivative-free nonlinear least-squares applied to the estimation of Manning coefficients

Arxiv

0+阅读 · 2021年4月6日

Lower Bounds Implementing Mediators in Asynchronous Systems

Arxiv

0+阅读 · 2021年4月6日

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

Arxiv

0+阅读 · 2021年4月6日

Context-Dependent Upper-Confidence Bounds for Directed Exploration

Arxiv

0+阅读 · 2021年4月6日

On the Optimality of Batch Policy Optimization Algorithms

Arxiv

0+阅读 · 2021年4月6日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员