分布式强力批量批量背景强盗 (Distributional Robust Batch Contextual Bandits) - 专知论文

会员服务 ·

0

稳健性 · 赌博机/老虎机 · 上下文赌博机/上下文老虎机 · 学成 · 回合 ·

2021 年 6 月 17 日

Distributional Robust Batch Contextual Bandits

翻译：分布式强力批量批量背景强盗

Nian Si,Fan Zhang,Zhengyuan Zhou,Jose Blanchet

from arxiv, The short version has been accepted in ICML 2020

Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data--an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributional robust policy with incomplete (bandit) observational data. We propose a novel learning algorithm that is able to learn a robust policy to adversarial perturbations and unknown covariate shifts. We first present a policy evaluation procedure in the ambiguous environment and then give a performance guarantee based on the theory of uniform convergence. Additionally, we also give a heuristic algorithm to solve the distributional robust policy learning problems efficiently. Finally, we demonstrate the robustness of our methods in the synthetic and real-world datasets.

翻译：使用历史观测数据进行政策学习是一个重要问题,已经得到广泛应用。例子包括选择报价、价格、向客户发送广告,以及选择向病人开药的药物。然而,现有文献所依据的关键假设是,今后运用所学政策的环境与以往产生数据-假设的环境相同,而以往的环境往往是假的或过于粗略。在本文中,我们取消这一假设,目的是学习一种分布式强的政策,其中含有不完整的(带式)观测数据。我们提出了一种新的学习算法,能够学习一种强有力的政策,以对抗性扰动和未知的共变换变化。我们首先在模糊的环境中提出政策评价程序,然后根据统一趋同理论提供业绩保证。此外,我们还给出了一种超理论算法,以有效解决分布式强的政策学习问题。最后,我们展示了我们在合成和真实世界数据集中的方法的稳健性。

0

相关内容

稳健性

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

专知会员服务

40+阅读 · 2020年7月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

A causal framework for distribution generalization

Arxiv

0+阅读 · 2021年8月17日

Coping with Label Shift via Distributionally Robust Optimisation

Arxiv

0+阅读 · 2021年8月17日

Efficient Hyperparameter Optimization under Multi-Source Covariate Shift

Arxiv

0+阅读 · 2021年8月16日

Toward Efficient Online Scheduling for Distributed Machine Learning Systems

Arxiv

0+阅读 · 2021年8月14日

Enhanced Regularizers for Attributional Robustness

Arxiv

0+阅读 · 2021年8月13日

Perturbed M-Estimation: A Further Investigation of Robust Statistics for Differential Privacy

Arxiv

0+阅读 · 2021年8月5日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

12+阅读 · 2021年4月16日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

ADMM-based Networked Stochastic Variational Inference

Arxiv

3+阅读 · 2018年2月27日

VIP会员

文章信息

相关主题

赌博机/老虎机

上下文赌博机/上下文老虎机

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

用于计算药物开发和发现的图卷积网络，Graph convolutional networks for computational drug development and discovery

专知会员服务

40+阅读 · 2020年7月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

A causal framework for distribution generalization

Arxiv

0+阅读 · 2021年8月17日

Coping with Label Shift via Distributionally Robust Optimisation

Arxiv

0+阅读 · 2021年8月17日

Efficient Hyperparameter Optimization under Multi-Source Covariate Shift

Arxiv

0+阅读 · 2021年8月16日

Toward Efficient Online Scheduling for Distributed Machine Learning Systems

Arxiv

0+阅读 · 2021年8月14日

Enhanced Regularizers for Attributional Robustness

Arxiv

0+阅读 · 2021年8月13日

Perturbed M-Estimation: A Further Investigation of Robust Statistics for Differential Privacy

Arxiv

0+阅读 · 2021年8月5日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

12+阅读 · 2021年4月16日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

ADMM-based Networked Stochastic Variational Inference

Arxiv

3+阅读 · 2018年2月27日

微信扫码咨询专知VIP会员