We study the problem of {\em online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, the algorithm recommends one item per user, for which it gets a (noisy) reward sampled from a low-rank user-item preference matrix. The goal is to design a method with sub-linear regret (in $\mathsf{T}$) and nearly optimal dependence on $\mathsf{M}$ and $\mathsf{N}$. The problem can be easily mapped to the standard multi-armed bandit problem where each item is an {\em independent} arm, but that leads to poor regret as the correlation between arms and users is not exploited. On the other hand, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of the low-rank manifold. We first demonstrate that the low-rank structure can be exploited using a simple explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get a non-trivial solution. We then improve our result for the rank-$1$ setting which in itself is quite challenging and encapsulates some of the key issues. Here, we propose \textsc{OCTAL} (Online Collaborative filTering using iterAtive user cLustering) that guarantees nearly optimal regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. OCTAL is based on a novel technique of clustering users that allows iterative elimination of items and leads to a nearly optimal minimax rate.
翻译:我们用$\mathsfsf{M} 用户、$\mathsf{N} 项目和$\mathsf{T} 来研究低端矩阵完成问题。 每次回合中, 算法建议每个用户一个项目, 但它从低端用户项目偏好矩阵中得到一个( nosy) 奖励。 目标是设计一种方法, 以亚线遗憾( $\masfsf{Tf} 美元) 和几乎最佳依赖$\mathsf{M} 用户, $\math{ 美元。 问题很容易被映射到标准多端的土匪问题, 每一个项目都是独立臂的, 但是, 武器与用户之间的相关性没有得到利用。 另一方面, 利用低端奖励矩阵的低级结构是挑战性的问题, 因为低端的节流质( 我们首先看到低级结构可以被利用简单的探索 美元- hal- comm} 用户自己可以确保 Ormals 的结果。</s>