We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a CC fusion is generally agnostic to the choice of score normalization; that CC outperforms RRF in in-domain and out-of-domain settings; and finally, that CC is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.
翻译:我们研究在词汇和语义搜索相结合的文本检索中进行混合搜索,同时直觉认为两者在建模相关性方面互为补充;特别是,我们研究用词典和语义分数的混集(CC)以及相互排名组合法(RRF)进行混合,并找出其优点和潜在陷阱。 与现有研究相反,我们发现RRF对其参数敏感;学习CC混集法对于得分正常化的选择一般是不可知的;CC在大陆内外环境中优于RRF;最后,CC是有效的样本,只需要少量的培训范例来调整其唯一的参数,使其与目标区域一致。