We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.
翻译:本文研究如何在情境多臂老虎机中识别用户聚类。情境多臂老虎机是很多实际应用(如内容推荐和在线广告)的有效工具。在实践中,用户间的依赖关系对于用户的动作以及获取的奖励具有重要作用。聚类相似的用户可以提高奖励估计的质量,从而可以更有效地进行内容推荐和定向广告。不同于传统的聚类设置,我们基于未知的老虎机参数来对用户进行聚类,这些参数将会被逐步地估计。特别地,我们定义了情境多臂老虎机下的聚类检测问题,并设计了嵌入了本地聚类程序的老虎机算法:LOCB。同时,我们在正确性和聚类效率以及后悔界三方面对LOCB进行了理论分析。最后,我们从多个方面评估了所提出的算法,结果表明LOCB优于现有的基线算法。