Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacts, a long-term and complex goal, and an algorithm that explores and adapts. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents, which are infamously known as sample inefficient. However, in a text corpus annotated for a given query, it is not the relevant documents but the irrelevant documents that predominate. This would cause very unbalanced training experiences for the agent and prevent it from learning any policy that is effective. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training. Our experimental results on the Text REtrieval Conference (TREC) Dynamic Domain (DD) 2017 Track show that the proposed method is able to boost an RL agent's learning effectiveness by 22\% in dealing with unseen situations.
翻译:互动信息检索(IR)和强化学习(RL)有许多共同点,包括一个在互动中学习的代理商,一个长期和复杂的目标,以及一个探索和调整的算法。要成功地将RL方法应用到IR,一个挑战是获得足够的相关标签来培训RL代理商,这些代理商被臭名昭著地称为抽样效率低下。然而,在给某个查询附加说明的文本材料中,不是相关文件,而是无关的文件。这将给代理商带来非常不平衡的培训经验,使其无法学习任何有效的政策。我们的文件通过使用域随机化来解决这一问题,为培训合成更相关的文件。我们在2017年文本检索(TRE)会议动态域(D)轨道上的实验结果显示,拟议的方法能够提高RL代理商在处理不可见情形方面的学习效率,途径是22 ⁇ 。