In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items. This in turn affects their interaction dynamics with the system and can invalidate previous algorithms built on the omniscient user assumption. In this paper, we formalize a model to capture such "learning users" and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. Interestingly, we prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse, until reaching linear regret when the user's learning fails to converge. Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.
翻译:在现实世界的建议问题中,尤其是那些拥有惊人大的项目空间的人,用户必须逐渐学会根据他们以前消费过的项目的经验来估计任何新建议的效用。这反过来又影响他们与系统的相互作用动态,并可能使先前建立在无所不知的用户假设基础上的算法失效。在本文中,我们将一种模型正式化,以捕捉这些“学习用户”并设计一个高效的系统方学习解决方案,从而创造出噪音-Robust活性Ellipso Search(RAES),以应对来自这种学习用户的非静态反馈带来的各种挑战。有趣的是,我们证明RAES的遗憾恶化,因为用户学习的趋同率越来越差,直到用户学习失败时出现线性遗憾。关于合成数据集的实验表明RAES对这种同时系统用户学习问题的力量。我们的研究为建议问题反馈循环的建模提供了一个新视角。