Ranking lies at the core of many Information Retrieval (IR) tasks. While existing research on Learning to Rank (LTR) using Deep Neural Network (DNN) has achieved great success, it is somewhat limited because of its dependence on fine-grained labels. In practice, fine-grained labels are often expensive to acquire, i.e. explicit relevance judgements, or suffer from biases, i.e. click logs. Compared to fine-grained labels, coarse-grained labels are easier and cheaper to collect. Some recent works propose utilizing only coarse-grained labels for LTR tasks. A most representative line of work introduces Reinforcement Learning (RL) algorithms. RL can help train the LTR model with little reliance on fine-grained labels compared to Supervised Learning. To study the effectiveness of the RL-based LTR algorithm on coarse-grained labels, in this paper, we implement four different RL paradigms and conduct extensive experiments on two well-established LTR datasets. The results on simulated coarse-grained labeled dataset show that while using coarse-grained labels to train an RL model for LTR tasks still can not outperform traditional approaches using fine-grained labels, it still achieve somewhat promising results and is potentially helpful for future research in LTR. Our code implementations will be released after this work is accepted.
 翻译:排名是许多信息检索(IR)任务的核心。 虽然目前关于使用深神经网络(DNN)学习排名(LTR)的研究已经取得了巨大成功,但由于对精密标签的依赖性,这种研究有些有限。 在实践中,精选标签往往要花很多钱才能获得, 也就是说, 明确相关判断, 或者有偏见, 即点击日志。 与精细标记相比, 粗选的标记比较容易收集, 粗选的标签比较容易收集, 更便宜。 一些最近的作品提议只使用粗略的标记来完成LTR任务。 最有代表性的工作路线引入了加强学习(RLNN)算法。 在培训LTR模型时, 微缩标签( RLLL) 的计算结果在模拟的标签标签后, 将使用精选的LTRGRA 模型来完成。