Sparse lexical representation learning has demonstrated much progress in improving passage retrieval effectiveness in recent models such as DeepImpact, uniCOIL, and SPLADE. This paper describes a straightforward yet effective approach for sparsifying lexical representations for passage retrieval, building on SPLADE by introducing a top-$k$ masking scheme to control sparsity and a self-learning method to coax masked representations to mimic unmasked representations. A basic implementation of our model is competitive with more sophisticated approaches and achieves a good balance between effectiveness and efficiency. The simplicity of our methods opens the door for future explorations in lexical representation learning for passage retrieval.
翻译:粗略的词汇代表性学习表明,在最近一些模型中,例如深层影响、单COIL和苏人解等模型中,在提高通行检索效率方面取得了很大进展。本文描述了一种简单而有效的方法,用以扩大用于通过检索的词汇代表,以苏人解法为基础,采用最高至1千元的面具计划来控制宽度,并采用自学方法将隐形代表混为一体,以模拟无包装代表制。我们模型的基本实施具有竞争力,采用更先进的方法,在效力和效率之间实现良好的平衡。我们方法的简单简洁为今后探索用于通道检索的词汇代表性学习打开了大门。