BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Both models utilize a ranker that receives BERT representations as the input and generates a relevance score as the output. In this work, we propose a method where multi-teacher distillation is applied to a cross-encoder NRM and a bi-encoder NRM to produce a bi-encoder NRM with two rankers. The resulting student bi-encoder achieves an improved performance by simultaneously learning from a cross-encoder teacher and a bi-encoder teacher and also by combining relevance scores from the two rankers. We call this method TRMD (Two Rankers and Multi-teacher Distillation). In the experiments, TwinBERT and ColBERT are considered as baseline bi-encoders. When monoBERT is used as the cross-encoder teacher, together with either TwinBERT or ColBERT as the bi-encoder teacher, TRMD produces a student bi-encoder that performs better than the corresponding baseline bi-encoder. For P@20, the maximum improvement was 11.4%, and the average improvement was 6.8%. As an additional experiment, we considered producing cross-encoder students with TRMD, and found that it could also improve the cross-encoders.
翻译:以 BERT 为基础的神经分级模型( NRM ) 可以按照查询和文档如何通过 BERT 的自我注意层( 双编码器和跨编码层) 编码进行分类。 双编码模型效率很高, 因为所有文件都可以在查询时间之前预处理, 但是其性能都低于交叉编码模型。 两种模型都使用获得 BERT 演示的军衔, 并产生与产出相关的分数。 在这项工作中, 我们建议一种方法, 将多教师的蒸馏应用到跨编码 NRM 和双编码的双编码器 NRM 中, 以产生双编码的双编码 NRMM 。 由此产生的学生双编码模型能够通过同时学习跨编码教师和双编码教师的成绩来提高业绩, 同时将两个军衔的分数合并起来。 我们称这个方法为 TRMD ( 双级和多教师的升级) 。 在实验中, TEBER 和 Col- 双编码的校的升级是用来进行双边测试的更好。 当使用和双式B 的师的成绩时, 和双式B 和双倍的成绩是用来进行。 。