分布强力批量批量上下文 (Distributionally Robust Batch Contextual Bandits) - 专知论文

会员服务 ·

0

稳健性 · Learning · 上下文赌博机/上下文老虎机 · 回合 · 赌博机/老虎机 ·

2022 年 10 月 25 日

Distributionally Robust Batch Contextual Bandits

翻译：分布强力批量批量上下文

Nian Si,Fan Zhang,Zhengyuan Zhou,Jose Blanchet

from arxiv, The short version has been accepted in ICML 2020

Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.

翻译：使用历史观测数据进行政策学习是一个重要问题,已经广泛应用。例子包括选择报价、价格、向客户发送广告,以及选择向病人开药的药物。然而,现有文献所依据的关键假设是,今后运用所学政策的环境与以往生成数据的环境相同 -- -- 这种假设往往是虚假的或过于粗略的近似 -- -- 在本文中,我们取消这一假设,目的是学习一种分布上稳健的政策,其中含有不完整的观测数据。我们首先提出一种政策评价程序,使我们能够评估该政策在最坏环境变化下的效果如何。然后,我们为这一拟议的政策评价计划设定一个中心限值类型保证。我们利用这一评价计划,进一步提出一种新的学习算法,能够学习一种对抗扰动性强和未知的共变,并以统一趋同理论为基础,保证性能。最后,我们从经验上检验了我们提出的合成数据集算法的有效性,并表明它提供了使用标准政策学习算法缺失的稳健性。我们通过提供真正的世界数据应用的方法来完成论文。

0

相关内容

稳健性

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

专知会员服务

28+阅读 · 2019年11月6日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

基于Nrf2介导的HMGB1/β-catenin途径探讨玉屏风散总苷在上皮间质转化中的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

TLR2介导的高脂高胆固醇诱导型NASH的发病机理

国家自然科学基金

0+阅读 · 2012年12月31日

B细胞活化转录因子（Batf）调节Th17在肝移植排斥反应中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

PTPRR-ERK介导的神经可塑性在抑郁症发生发展中的作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向润滑用的离子液体粘度数据估算研究

国家自然科学基金

0+阅读 · 2011年12月31日

Object-fabrication Targeted Attack for Object Detection

Arxiv

0+阅读 · 2022年12月13日

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

Arxiv

0+阅读 · 2022年12月13日

Minimax Optimal Estimation of Stability Under Distribution Shift

Arxiv

0+阅读 · 2022年12月13日

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Arxiv

0+阅读 · 2022年12月12日

On Generalization and Regularization via Wasserstein Distributionally Robust Optimization

Arxiv

0+阅读 · 2022年12月12日

Nonparametric inference about increasing odds rate distributions

Arxiv

0+阅读 · 2022年12月10日

Adversarial Weight Perturbation Improves Generalization in Graph Neural Network

Arxiv

0+阅读 · 2022年12月9日

Effective and Imperceptible Adversarial Textual Attack via Multi-objectivization

Arxiv

0+阅读 · 2022年12月9日

Strategic Coalition for Data Pricing in IoT Data Markets

Arxiv

0+阅读 · 2022年12月8日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

专知会员服务

28+阅读 · 2019年11月6日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Object-fabrication Targeted Attack for Object Detection

Arxiv

0+阅读 · 2022年12月13日

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

Arxiv

0+阅读 · 2022年12月13日

Minimax Optimal Estimation of Stability Under Distribution Shift

Arxiv

0+阅读 · 2022年12月13日

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Arxiv

0+阅读 · 2022年12月12日

On Generalization and Regularization via Wasserstein Distributionally Robust Optimization

Arxiv

0+阅读 · 2022年12月12日

Nonparametric inference about increasing odds rate distributions

Arxiv

0+阅读 · 2022年12月10日

Adversarial Weight Perturbation Improves Generalization in Graph Neural Network

Arxiv

0+阅读 · 2022年12月9日

Effective and Imperceptible Adversarial Textual Attack via Multi-objectivization

Arxiv

0+阅读 · 2022年12月9日

Strategic Coalition for Data Pricing in IoT Data Markets

Arxiv

0+阅读 · 2022年12月8日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

相关基金

基于Nrf2介导的HMGB1/β-catenin途径探讨玉屏风散总苷在上皮间质转化中的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

TLR2介导的高脂高胆固醇诱导型NASH的发病机理

国家自然科学基金

0+阅读 · 2012年12月31日

B细胞活化转录因子（Batf）调节Th17在肝移植排斥反应中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

PTPRR-ERK介导的神经可塑性在抑郁症发生发展中的作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向润滑用的离子液体粘度数据估算研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员