We study the problem of \textit{online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in $\mathsf{T}$). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-$1$ setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $\mathsf{T}$.
翻译:我们用 $\ mathsf{ m} 用户、 $\ mathsf{ N} 物品和 $\ mathsf{ T} 回合来研究 低级别矩阵完成问题。 在每轮中, 我们建议每个用户一个项目。 对于每轮, 我们从低级别用户项目奖励矩阵中获得一个( noisy) 奖赏的样本。 目标是设计一个以亚线性遗憾( $\ mathsf{ T} 美元) 完成的在线方法。 虽然问题可以映射到每件项目都是 \ texts{ N} 和 $\ m} 独立臂的标准多武装土匪问题, 但随着武器与用户的相互关系没有被利用,它会导致遗憾。 相反, 利用低级别奖赏矩阵的低级别结构, 是因为低级的巨型。 我们用一个探索式的方法克服了这一挑战, 保证了 $( mfs) 最低级( mfs) 最佳结果( wef} (O\\\\\\ f) 最接近 O.