We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. The standard MAB algorithms for countably many arms begin with selecting a random set of candidate arms and then applying standard MAB algorithms, e.g., UCB, on this candidate set downstream. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms. We show that in tasks like online information gathering, where sequential query recommendations are employed, the sequences of queries are correlated and the number of potentially optimal queries can be reduced to a manageable size by selecting queries with maximum utility with respect to the currently executing query. Our experimental results using a recent real online literature discovery service log file demonstrate that the proposed arm selection strategy improves the cumulative regret substantially with respect to the state-of-the-art baseline algorithms. % and commonly used random selection strategy for a variety of contextual multi-armed bandit algorithms. Our data model and source code are available at ~\url{https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/}.
翻译:我们考虑闭路互动学习设置中的查询建议问题,比如在线信息采集和探索分析。问题可以自然地以多武装匪徒(MAB)框架(MAB)为模型,使用大量武器。许多武器的标准MAAB算法从随机选择一组候选武器开始,然后在这个候选人的下游设置上应用标准的MAB算法,例如UCB。我们表明,这种选择战略往往导致累积性更强的遗憾,并为此目的,我们提议了一个基于武器最大用途的甄选战略。我们表明,在网上信息收集等任务中,使用顺序查询建议,查询顺序是相互关联的,而可能的最佳查询数量可以通过选择对当前执行查询具有最大效用的查询,减少到可控制的规模。我们使用最近在线实际文献查找服务日志的实验结果显示,拟议的武器选择战略大大改善了与最新基线算法有关的累积性遗憾。% 以及各种背景多武装调查算法通常使用的随机选择战略。我们的数据模型和源代码可在 @http://473-bur_basma_basma_910。我们的数据模型和源码代码可在ZQ_AMS_9101010/ a_410_d_bard_bard_415_bsirard_scoard/scool