探索和超越:一个理论提案 (Neural Retriever and Go Beyond: A Thesis Proposal)

Information Retriever (IR) aims to find the relevant documents (e.g. snippets, passages, and articles) to a given query at large scale. IR plays an important role in many tasks such as open domain question answering and dialogue systems, where external knowledge is needed. In the past, searching algorithms based on term matching have been widely used. Recently, neural-based algorithms (termed as neural retrievers) have gained more attention which can mitigate the limitations of traditional methods. Regardless of the success achieved by neural retrievers, they still face many challenges, e.g. suffering from a small amount of training data and failing to answer simple entity-centric questions. Furthermore, most of the existing neural retrievers are developed for pure-text query. This prevents them from handling multi-modality queries (i.e. the query is composed of textual description and images). This proposal has two goals. First, we introduce methods to address the abovementioned issues of neural retrievers from three angles, new model architectures, IR-oriented pretraining tasks, and generating large scale training data. Second, we identify the future research direction and propose potential corresponding solution.

翻译：信息检索(IR)的目的是大规模地找到某个查询的相关文件(如片段、段落和文章)。IR在许多任务中发挥着重要作用,如开放域问答和对话系统,这些任务需要外部知识。过去,基于术语匹配的搜索算法被广泛使用。最近,神经算法(称为神经检索器)得到更多的注意,可以减轻传统方法的局限性。尽管神经检索器取得了成功,但它们仍然面临着许多挑战,例如,受到少量培训数据的影响,无法回答简单的实体中心问题。此外,大多数现有的神经检索器是为纯文本查询而开发的。这使得它们无法处理多模式查询(即查询由文字描述和图像组成),这个提议有两个目标。首先,我们从三个角度、新的模型结构、IR导向的预培训任务以及产生大规模培训数据,我们提出了解决上述神经检索器问题的方法。第二,我们确定了未来的研究方向,并提出了相应的解决办法。

相关内容

关注 14

信息检索杂志（IR）为信息检索的广泛领域中的理论、算法分析和实验的发布提供了一个国际论坛。感兴趣的主题包括对应用程序（例如Web，社交和流媒体，推荐系统和文本档案）的搜索、索引、分析和评估。这包括对搜索中人为因素的研究、桥接人工智能和信息检索以及特定领域的搜索应用程序。官网地址：https://dblp.uni-trier.de/db/journals/ir/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日