有效、高效和接触软件信息检索的神经方法 (Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval)

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms--such as a person's name or a product model number--not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.

翻译：具有深层结构的神经网络在计算机视野、语音识别和自然语言处理方面表现出了显著的绩效改进。但是,信息检索(IR)的挑战不同于其他应用领域。信息检索(IR)的共同形式是,对基于关键词的查询进行文件-或短段落的排序;有效的IR系统必须处理查询-文档词汇错配问题,办法是建模不同查询和文件术语之间的关系以及它们如何表明相关性。模型还应在查询包含罕见的术语时考虑逻辑匹配,如某人的姓名或产品模型编号,在培训期间没有看到,以避免重新获得与语义相关但无关的结果。在许多真实的IR任务中,检索涉及极其庞大的收藏,例如商业网络搜索引擎-含有数十亿份文件的文件索引。高效的IR方法应当利用专门的IR数据结构,例如倒置索引,以便有效地从大量收藏中检索。鉴于信息需要,IR系统还利用多少接触通过决定是否显示其是否显示内置相关但与结果不相干的结果。在I-R上,它应该将这种有动机的内置系统与I-在哪些方面,它应该优化的内置目标,在哪些方面,作为我们目前对内置的内置目标的内置目标,例如I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I