Query understanding is a fundamental problem in information retrieval (IR), which has attracted continuous attention through the past decades. Many different tasks have been proposed for understanding users' search queries, e.g., query classification or query clustering. However, it is not that precise to understand a search query at the intent class/cluster level due to the loss of many detailed information. As we may find in many benchmark datasets, e.g., TREC and SemEval, queries are often associated with a detailed description provided by human annotators which clearly describes its intent to help evaluate the relevance of the documents. If a system could automatically generate a detailed and precise intent description for a search query, like human annotators, that would indicate much better query understanding has been achieved. In this paper, therefore, we propose a novel Query-to-Intent-Description (Q2ID) task for query understanding. Unlike those existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description based on both relevant and irrelevant documents of a given query. To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query. We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task. We discuss the potential usage of such Q2ID technique through an example application.
翻译:查询理解是信息检索(IR)的一个根本问题,在过去几十年中,信息检索一直引起人们的关注。许多不同的任务都是为了了解用户的搜索询问,例如查询分类或查询群群。然而,由于丢失了许多详细的信息,因此在意图类/组一级理解搜索询问并不准确。正如我们在许多基准数据集(例如TREC和SemEval)中可能发现的那样,查询往往与人类说明者提供的详细描述有关,这些描述明确说明了它帮助评估文件相关性的意图。如果一个系统能够自动为搜索查询(例如查询分类或查询群)生成详细和准确的意向说明,从而表明了解的程度要好得多。因此,在本文件中,我们提出一个新的查询组到说明(Q2ID)的任务。与现有的那些利用现有查询和描述模型来计算文件的相关性的现有排序任务不同,Q2ID是一项逆向应用,目的是根据相关和不相关的文件(例如人类说明者)自动生成一个详细和准确的意向说明。我们通过一个相关的模型来比较一个新的任务。我们用一个新的任务来比较某个变式文件。我们用一个新的变式文件,用这个方法来比较一个新的变式文件,即变式文件,我们用一个新的变式文件。