Existing hybrid retrievers which integrate sparse and dense retrievers, are indexing-heavy, limiting their applicability in real-world on-devices settings. We ask the question "Is it possible to reduce the indexing memory of hybrid retrievers without sacrificing performance?" Driven by this question, we leverage an indexing-efficient dense retriever (i.e. DrBoost) to obtain a light hybrid retriever. Moreover, to further reduce the memory, we introduce a lighter dense retriever (LITE) which is jointly trained on contrastive learning and knowledge distillation from DrBoost. Compared to previous heavy hybrid retrievers, our Hybrid-LITE retriever saves 13 memory while maintaining 98.0 performance. In addition, we study the generalization of light hybrid retrievers along two dimensions, out-of-domain (OOD) generalization and robustness against adversarial attacks. We evaluate models on two existing OOD benchmarks and create six adversarial attack sets for robustness evaluation. Experiments show that our light hybrid retrievers achieve better robustness performance than both sparse and dense retrievers. Nevertheless there is a large room to improve the robustness of retrievers, and our datasets can aid future research.
翻译:将稀有和稠密的检索器整合在一起的现有混合检索器正在编制粗重的索引,限制了其在现实世界的装置中的适用性。我们问了一个问题 : “ 是否有可能在不牺牲性能的情况下减少混合检索器的索引记忆? ”这个问题驱使我们利用一个指数效率高的密集检索器(即Dr Boost)来获取一个轻巧的混合检索器(即Dr Boost)来获取一个轻巧的混合检索器(LITE)来进一步减少记忆力。 此外,为了进一步减少记忆力,我们引入了一个较轻的密集检索器(LITE)来进行对比性学习和知识蒸馏方面的联合培训。 与以前的重混合检索器相比,我们的混合检索器节省了13个记忆力,同时保持98.0的性能。 此外,我们研究了光混合检索器在两个维度上的总体性,即外(OOD)一般化和强力地研究,以对抗性攻击的模型来评估强健健健健健性。 实验表明,我们的轻混合检索器比我们的回收器都具有更大的空间。