Due to high annotation costs, making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In contrast, each of our collections has a substantial number of queries, which enables a full-shot evaluation mode and improves reliability of our results. Furthermore, since source datasets licences often prohibit commercial use, we compare transfer learning to training on pseudo-labels generated by a BM25 scorer. We find that training on pseudo-labels -- possibly with subsequent fine-tuning using a modest number of annotated queries -- can produce a competitive or better model compared to transfer learning. However, there is a need to improve the stability and/or effectiveness of the few-shot training, which, in some cases, can degrade performance of a pretrained model.
翻译:由于注解成本高,最佳利用现有人类创造的培训数据是一个重要的研究方向。因此,我们系统地评估了基于BERT的神经等级模型在五个英国数据集中的可转让性。以前的研究主要侧重于零点和几发从大型数据集转移到数据集,但查询数量少。相比之下,我们收集的每份都有大量的查询,使得能够采用全速评价模式并提高我们结果的可靠性。此外,由于源数据集许可证常常禁止商业使用,我们把学习与一个BB25计分器生成的假标签培训进行比较。我们发现,假标签培训 -- -- 可能随后使用少量附加说明的查询进行微调 -- -- 能够产生一种与转移学习相比的竞争性或更好的模型。然而,需要提高微量培训的稳定性和(或)效力,在某些情况下,这可以降低预先培训的模式的性能。