With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness. Recently, we have also seen the presence of dense retrieval models in Math Information Retrieval (MIR) tasks, but the most effective systems remain classic retrieval methods that consider hand-crafted structure features. In this work, we try to combine the best of both worlds:\ a well-defined structure search method for effective formula search and efficient bi-encoder dense retrieval models to capture contextual similarities. Specifically, we have evaluated two representative bi-encoder models for token-level and passage-level dense retrieval on recent MIR tasks. Our results show that bi-encoder models are highly complementary to existing structure search methods, and we are able to advance the state-of-the-art on MIR datasets.
翻译:最近,在基于双电解码的密集检索方法取得成功之后,各项研究将这种方法应用到各种有趣的下游回收任务中,并具有良好的效率和内部效能。最近,我们还看到数学信息检索(MIR)任务中存在密集检索模型,但最有效的系统仍然是考虑到手工制作结构特征的经典检索方法。在这项工作中,我们试图将两个世界最好的方法结合起来: 一种定义明确的结构搜索方法,用于有效的公式搜索和高效的双电码密集检索模型,以捕捉背景相似之处。具体地说,我们评估了两种具有代表性的双电解模型,用于最近MIR任务中象征性和通过级密集检索。我们的结果显示,双电解码模型与现有的结构搜索方法非常互补,并且我们能够推进MIR数据集的状态。