多项式Logit情境赌博的易处理的在线学习算法 (A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit) - 专知论文

会员服务 ·

0

易处理的 · 对数几率 · 情境 · 效用 · 属性 ·

2023 年 3 月 27 日

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

翻译：多项式Logit情境赌博的易处理的在线学习算法

Priyank Agrawal,Theja Tulabandhula,Vashist Avadhanula

from arxiv, Accepted to be published at Elsevier European Journal of Operational Research (EJOR)

In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision maker problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon $T$. Though this problem has attracted considerable attention in recent times, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, existing algorithms for this problem have regret bounded by $O(\sqrt{\kappa d T})$, where $\kappa$ is a problem-dependent constant that can have an exponential dependency on the number of attributes. In this paper, we propose an optimistic algorithm and show that the regret is bounded by $O(\sqrt{dT} + \kappa)$, significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favourable regret guarantee.

翻译：在本文中，我们考虑MNL-Bandit问题的情境变量。更具体地，我们考虑一个动态集优化问题，其中决策者向消费者提供一组产品，并在每个回合观察响应。消费者购买产品以最大化其效用。我们假设一组属性描述了产品，并且产品的平均效用在这些属性的值中是线性的。我们使用广泛使用的多项式Logit（MNL）模型来建模消费者的选择行为，并考虑在优化销售时间为$T$的累积收入时动态学习模型参数的决策者问题。虽然这个问题最近引起了相当多的关注，但许多现有方法往往涉及解决一个难以处理的非凸优化问题。它们的理论性能保证依赖于一个可能非常大的与问题相关的参数。特别地，现有算法的遗憾被限制在$O(\sqrt{\kappa d T})$，其中$\kappa$是可能具有指数依赖于属性数量的问题相关常数。在本文中，我们提出了一种乐观算法，并证明遗憾被限制在$O(\sqrt{dT}+\kappa)$，显着提高了现有方法的性能。此外，我们提出了一种优化步骤的凸松弛，它允许易处理的决策制定，同时保留了有利的遗憾保证。

0

相关内容

易处理的

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

46+阅读 · 2022年12月24日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

129+阅读 · 2021年4月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

不完全信息下的投资组合选择模型研究：一个时间一致性的视角

国家自然科学基金

4+阅读 · 2015年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

农村地区2型糖尿病Markov模型构建及相关干预策略经济学评价

国家自然科学基金

0+阅读 · 2013年12月31日

克服库存不精确的鲁棒集成补货、生产控制及分销策略

国家自然科学基金

0+阅读 · 2012年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于正则Vine copula的相依建模及软件开发

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

非线性协整模型的有效估计、检验及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

Domain Decomposition Learning Methods for Solving Elliptic Problems

Arxiv

0+阅读 · 2023年5月17日

A hybrid deep-learning-metaheuristic framework for discrete road network design problems

Arxiv

0+阅读 · 2023年5月16日

Improving the Data Efficiency of Multi-Objective Quality-Diversity through Gradient Assistance and Crowding Exploration

Arxiv

0+阅读 · 2023年5月16日

Evaluation Strategy of Time-series Anomaly Detection with Decay Function

Arxiv

0+阅读 · 2023年5月15日

Fairness in Forecasting of Observations of Linear Dynamical Systems

Arxiv

0+阅读 · 2023年5月15日

Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification

Arxiv

0+阅读 · 2023年5月15日

Enhancing Datalog Reasoning with Hypertree Decompositions

Arxiv

0+阅读 · 2023年5月15日

Generalized Kernel Two-Sample Tests

Arxiv

0+阅读 · 2023年5月14日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding

Arxiv

0+阅读 · 2023年5月11日

VIP会员

文章信息

相关主题

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

46+阅读 · 2022年12月24日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

129+阅读 · 2021年4月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Domain Decomposition Learning Methods for Solving Elliptic Problems

Arxiv

0+阅读 · 2023年5月17日

A hybrid deep-learning-metaheuristic framework for discrete road network design problems

Arxiv

0+阅读 · 2023年5月16日

Improving the Data Efficiency of Multi-Objective Quality-Diversity through Gradient Assistance and Crowding Exploration

Arxiv

0+阅读 · 2023年5月16日

Evaluation Strategy of Time-series Anomaly Detection with Decay Function

Arxiv

0+阅读 · 2023年5月15日

Fairness in Forecasting of Observations of Linear Dynamical Systems

Arxiv

0+阅读 · 2023年5月15日

Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification

Arxiv

0+阅读 · 2023年5月15日

Enhancing Datalog Reasoning with Hypertree Decompositions

Arxiv

0+阅读 · 2023年5月15日

Generalized Kernel Two-Sample Tests

Arxiv

0+阅读 · 2023年5月14日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding

Arxiv

0+阅读 · 2023年5月11日

相关基金

不完全信息下的投资组合选择模型研究：一个时间一致性的视角

国家自然科学基金

4+阅读 · 2015年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

农村地区2型糖尿病Markov模型构建及相关干预策略经济学评价

国家自然科学基金

0+阅读 · 2013年12月31日

克服库存不精确的鲁棒集成补货、生产控制及分销策略

国家自然科学基金

0+阅读 · 2012年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于正则Vine copula的相依建模及软件开发

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

非线性协整模型的有效估计、检验及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员