We introduce the \emph{Correlated Preference Bandits} problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of $n$ items through online subsetwise preference feedback. We investigate whether models with a simple correlation structure, e.g. low rank, can result in faster learning rates. While we show that the problem can be impossible to solve for the general `low rank' choice models, faster learning rates can be attained assuming more structured item correlations. In particular, we introduce a new class of \emph{Block-Rank} based RUM model, where the best item is shown to be $(\epsilon,\delta)$-PAC learnable with only $O(r \epsilon^{-2} \log(n/\delta))$ samples. This improves on the standard sample complexity bound of $\tilde{O}(n\epsilon^{-2} \log(1/\delta))$ known for the usual learning algorithms which might not exploit the item-correlations ($r \ll n$). We complement the above sample complexity with a matching lower bound (up to logarithmic factors), justifying the tightness of our analysis. Surprisingly, we also show a lower bound of $\Omega(n\epsilon^{-2}\log(1/\delta))$ when the learner is forced to play just duels instead of larger subsetwise queries. Further, we extend the results to a more general `\emph{noisy Block-Rank}' model, which ensures robustness of our techniques. Overall, our results justify the advantage of playing subsetwise queries over pairwise preferences $(k=2)$, we show the latter provably fails to exploit correlation.
翻译:我们引入了 emph{ 和 Cor reference Banits} 随机的基于工具的选择模型( RUM ) 问题, 目标是通过在线子子偏好反馈, 从一个特定集合的美元项目中找出最好的项目。 我们调查简单的关联结构模型( 例如低级别) 是否会导致更快的学习率。 虽然我们显示对于通用的“ 低级别” 选择模型来说, 问题无法解决, 假设项目的相关性更结构化, 就能达到更快的学习率。 特别是, 我们引入了一个新的类别, 以 emph{ Block- Rank} 为基点的 RUM 模型, 其中显示的最佳项目为$( epsilon,\ delta) $- PAC 的最好的项目, 只有 $(r\ ipslusl% 2} 样本才能导致更快的学习率。 这在 $\ titilde {O} 标准样本的组合中, (n\ liverlationlational_ =x lax lax lax lax lax lax lax) lax lax lax lax lax lax) lax lax a lax lax lax lax lax lax lax lax lax lax lax lax