以 UCB 为基础的多重后勤倒退强盗的计算法 (UCB-based Algorithms for Multinomial Logistic Regression Bandits)

Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click', `show me later', `never show again', `no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of $K+1\geq 2$ possible outcomes (+1 stands for the `not click' outcome): we assume that for a learner's action $\mathbf{x}_t$, the user selects one of $K+1\geq 2$ outcomes, say outcome $i$, with a multinomial logit (MNL) probabilistic model with corresponding unknown parameter $\bar{\boldsymbol\theta}_{\ast i}$. Each outcome $i$ is also associated with a revenue parameter $\rho_i$ and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret $\tilde{\mathcal{O}}(dK\sqrt{T})$ with small dependency on problem-dependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results.

翻译：在一般线性匪徒的丰富家族中 { 广泛线性土匪的丰富家族中,也许研究得最周密的是用于解决二进制奖励问题的对数土匪:例如,当学习者/代理人试图使能够选择两种可能结果之一的用户获得最大利润时(例如,“点击”对“不点击 ” ) 。尽管最近取得了显著的进步,后勤匪徒的算法也得到了改进,但现有的工程并没有解决用户可以选择的结果数量大于两个(例如,“点击”、“稍后向我展示”、“不再显示”、“不点击” )。在本文中,我们研究这样一个扩展。我们使用多数值logit(MNL) 来模拟每个K+1 geq 2美元可能结果的概率(+1是“不点击”的结果 :对于学习者的行动$\ mathbf{xxx} 基础,用户选择了$K+1+Geq_bral_bral_bral_al resulate resulate, ligal ligal_al_alx ligal ligal ligal_al bal_al_al ligal bal bal ligal ligal lix) maxnal bal ma_al bal maxnal max maxx maxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

相关内容

多项逻辑回归

关注 5

多元逻辑回归模型的理论前提相对判别分析法要宽松得多，且没有关于分布类型、协方差阵等方面的严格假定。不过，在大量运用多元逻辑回归的研究中往往忽视了另一个相当重要的问题，即模型自变量之间可能存在的多重共线性干扰。与其他多元回归方法一样，Logistic回归模型也对多元共线性敏感。当变量之间的相关程度提高时，系数估计的标准误将会急剧增加；同时，系数对样本和模型设置都非常敏感，模型设置的微小变化、在样本总体中加入或删除案例等变动，都会导致系数估计的较大变化。

ICLR2021 | 初探GNN的表示能力

专知会员服务

28+阅读 · 2021年5月2日