We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.
翻译:我们研究的是背景多武装匪帮中的用户群。背景马虎帮是许多实际应用的有效工具,例如内容建议和在线广告。在实践中,用户依赖性在用户的行动中起着关键作用,从而带来回报。类似的用户群集可以提高奖赏估算的质量,这反过来导致更有效的内容建议和有针对性的广告。不同于传统的群集设置,我们根据未知土匪参数分组用户群集,这种参数将逐步估算。特别是,我们界定了背景马虎帮的群集检测问题,并提出了土匪算法(LOCB),与本地群集程序相结合。我们还从集群的正确性、效率及其遗憾程度的角度对LOCB进行理论分析。最后,我们从不同方面评估拟议的算法,这些算法超越了最先进的基线。