Conversational contextual bandits elicit user preferences by occasionally querying for explicit feedback on key-terms to accelerate learning. However, there are aspects of existing approaches which limit their performance. First, information gained from key-term-level conversations and arm-level recommendations is not appropriately incorporated to speed up learning. Second, it is important to ask explorative key-terms to quickly elicit the user's potential interests in various domains to accelerate the convergence of user preference estimation, which has never been considered in existing works. To tackle these issues, we first propose ``ConLinUCB", a general framework for conversational bandits with better information incorporation, combining arm-level and key-term-level feedback to estimate user preference in one step at each time. Based on this framework, we further design two bandit algorithms with explorative key-term selection strategies, ConLinUCB-BS and ConLinUCB-MCR. We prove tighter regret upper bounds of our proposed algorithms. Particularly, ConLinUCB-BS achieves a regret bound of $O(\sqrt{dT\log T})$, better than the previous result $O(d\sqrt{T}\log T)$. Extensive experiments on synthetic and real-world data show significant advantages of our algorithms in learning accuracy (up to 54\% improvement) and computational efficiency (up to 72\% improvement), compared to the classic ConUCB algorithm, showing the potential benefit to recommender systems.
翻译:联系背景的土匪有时会询问对关键术语的明确反馈以加速学习,从而吸引用户偏好,有时会询问对关键术语的明确反馈以加快学习。然而,现有方法的某些方面限制了他们的绩效。首先,从关键层面的谈话和武装层面的建议中获得的信息没有适当地纳入加速学习。第二,必须要求探索的关键术语迅速激发用户在各个领域的潜在兴趣,以加快用户偏好估算的趋同,而现有工作从未考虑过这一点。为了解决这些问题,我们首先建议“ConLinUCB-B” ” ”, 是一个拥有更好的信息集成、将武装层面和关键层面的反馈结合起来,以便每一步估计用户偏好的信息。基于这一框架,我们进一步设计了两种具有卓越关键期选择战略的土匪算法,即ConLinUCB-B和ConLinUCB-MCR。 事实证明,我们提议的算法的上界限更加令人后悔。 特别是,ConLinUB-B, 推荐-BSB, 一个拥有更好信息集成更好的信息框架,每一步, 将ALO的精准, 显示我们实际的精度的精度分析的精度的精度,比。</s>