分布式强力批量批量背景强盗 (Distributional Robust Batch Contextual Bandits) - 专知论文

会员服务 ·

0

稳健性 · 赌博机/老虎机 · 上下文赌博机/上下文老虎机 · 学成 · 回合 ·

2021 年 8 月 2 日

Distributional Robust Batch Contextual Bandits

翻译：分布式强力批量批量背景强盗

Nian Si,Fan Zhang,Zhengyuan Zhou,Jose Blanchet

from arxiv, The short version has been accepted in ICML 2020

Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data--an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributional robust policy with incomplete (bandit) observational data. We propose a novel learning algorithm that is able to learn a robust policy to adversarial perturbations and unknown covariate shifts. We first present a policy evaluation procedure in the ambiguous environment and then give a performance guarantee based on the theory of uniform convergence. Additionally, we also give a heuristic algorithm to solve the distributional robust policy learning problems efficiently. Finally, we demonstrate the robustness of our methods in the synthetic and real-world datasets.

翻译：使用历史观测数据进行政策学习是一个重要问题,已经得到广泛应用。例子包括选择报价、价格、向客户发送广告,以及选择向病人开药的药物。然而,现有文献所依据的关键假设是,今后运用所学政策的环境与以往产生数据-假设的环境相同,而以往的环境往往是假的或过于粗略。在本文中,我们取消这一假设,目的是学习一种分布式强的政策,其中含有不完整的(带式)观测数据。我们提出了一种新的学习算法,能够学习一种强有力的政策,以对抗性扰动和未知的共变换变化。我们首先在模糊的环境中提出政策评价程序,然后根据统一趋同理论提供业绩保证。此外,我们还给出了一种超理论算法,以有效解决分布式强的政策学习问题。最后,我们展示了我们在合成和真实世界数据集中的方法的稳健性。

0

相关内容

稳健性

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

23+阅读 · 2020年11月25日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

Arxiv

0+阅读 · 2021年10月4日

Batch size-invariance for policy optimization

Arxiv

0+阅读 · 2021年10月1日

Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables

Arxiv

0+阅读 · 2021年10月1日

Approximate Regions of Attraction in Learning with Decision-Dependent Distributions

Arxiv

0+阅读 · 2021年10月1日

Multi-index Antithetic Stochastic Gradient Algorithm

Arxiv

0+阅读 · 2021年9月30日

Optimal Learning for Structured Bandits

Arxiv

0+阅读 · 2021年9月29日

Sequential Estimation under Multiple Resources: a Bandit Point of View

Arxiv

0+阅读 · 2021年9月29日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

VIP会员

文章信息

相关主题

赌博机/老虎机

上下文赌博机/上下文老虎机

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【重磅】2021年IEEE Fellow出炉！ 282位新晋升会士！七十多位华人当选！

专知会员服务

23+阅读 · 2020年11月25日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军五大转型方向

一种Agent自主性风险评估框架 | 最新文献

实时无人机指令处理：一种面向无人机系统的大语言模型方法

基于动态知识图谱的人工智能代理自主研究周期 | 文献

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

Arxiv

0+阅读 · 2021年10月4日

Batch size-invariance for policy optimization

Arxiv

0+阅读 · 2021年10月1日

Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables

Arxiv

0+阅读 · 2021年10月1日

Approximate Regions of Attraction in Learning with Decision-Dependent Distributions

Arxiv

0+阅读 · 2021年10月1日

Multi-index Antithetic Stochastic Gradient Algorithm

Arxiv

0+阅读 · 2021年9月30日

Optimal Learning for Structured Bandits

Arxiv

0+阅读 · 2021年9月29日

Sequential Estimation under Multiple Resources: a Bandit Point of View

Arxiv

0+阅读 · 2021年9月29日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

微信扫码咨询专知VIP会员