学习排名的非政策性乐观主义 (Pessimistic Off-Policy Optimization for Learning to Rank) - 专知论文

会员服务 ·

0

Learning · 优化器 · 秩 · 频率主义学派 · 估计/估计量 ·

2022 年 8 月 19 日

Pessimistic Off-Policy Optimization for Learning to Rank

翻译：学习排名的非政策性乐观主义

Matej Cief,Branislav Kveton,Michal Kompan

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient and we analyze it. We study its Bayesian and frequentist variants, and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines, is robust, and is also general.

翻译：离政策学习是优化政策而不部署政策的框架,它使用另一种政策收集的数据。在推荐者系统中,由于登录数据不平衡,这尤其具有挑战性:有些项目被推荐,因此比其他项目更经常登录。在建议项目列表时,这进一步延续,因为行动空间是组合的。为了应对这一挑战,我们研究悲观的离政策优化,以便学习排名。关键的想法是计算点击模型参数的较低信任度,然后以对其价值的悲观估计值返回列表。这个方法具有计算效率,我们分析它。我们研究了其贝叶和常客变异,并克服了先前未知的局限性,纳入了经验性贝叶。为了显示我们的方法的经验效果,我们将其与使用反偏向偏向分数或忽略不确定性的离政策优化者进行比较。我们的方法超越了所有基线,是稳健的,也是普遍的。

0

相关内容

Learning

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

半线性广义Tricomi方程Cauchy问题解的生命跨度估计研究

国家自然科学基金

0+阅读 · 2017年12月31日

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Filamin A蛋白在常染色体隐性遗传性多囊肾发生发展中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

超声波电机高效率非线性Hammerstein控制方法

国家自然科学基金

0+阅读 · 2013年12月31日

LSD1在急性肾损伤后肾小管上皮细胞再生修复中的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白去甲基化酶JMJD3在促成年心肌细胞增殖中的作用及其机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

RANK-钙离子ATP酶新机制阻止足细胞损伤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Narf影响细胞衰老的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

Arxiv

0+阅读 · 2022年10月6日

Meta-Learning Priors for Safe Bayesian Optimization

Arxiv

0+阅读 · 2022年10月3日

Bayesian Joint Chance Constrained Optimization: Approximations and Statistical Consistency

Arxiv

0+阅读 · 2022年10月1日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

PyPose: A Library for Robot Learning with Physics-based Optimization

Arxiv

0+阅读 · 2022年9月30日

Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

频率主义学派

估计/估计量

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

Arxiv

0+阅读 · 2022年10月6日

Meta-Learning Priors for Safe Bayesian Optimization

Arxiv

0+阅读 · 2022年10月3日

Bayesian Joint Chance Constrained Optimization: Approximations and Statistical Consistency

Arxiv

0+阅读 · 2022年10月1日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

PyPose: A Library for Robot Learning with Physics-based Optimization

Arxiv

0+阅读 · 2022年9月30日

Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

半线性广义Tricomi方程Cauchy问题解的生命跨度估计研究

国家自然科学基金

0+阅读 · 2017年12月31日

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Filamin A蛋白在常染色体隐性遗传性多囊肾发生发展中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

超声波电机高效率非线性Hammerstein控制方法

国家自然科学基金

0+阅读 · 2013年12月31日

LSD1在急性肾损伤后肾小管上皮细胞再生修复中的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白去甲基化酶JMJD3在促成年心肌细胞增殖中的作用及其机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

RANK-钙离子ATP酶新机制阻止足细胞损伤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Narf影响细胞衰老的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员