Word embeddings are vital descriptors of words in unigram representations of documents for many tasks in natural language processing and information retrieval. The representation of queries has been one of the most critical challenges in this area because it consists of a few terms and has little descriptive capacity. Strategies such as average word embeddings can enrich the queries' descriptive capacity since they favor the identification of related terms from the continuous vector representations that characterize these approaches. We propose a data-driven strategy to combine word embeddings. We use Idf combinations of embeddings to represent queries, showing that these representations outperform the average word embeddings recently proposed in the literature. Experimental results on benchmark data show that our proposal performs well, suggesting that data-driven combinations of word embeddings are a promising line of research in ad-hoc information retrieval.
翻译:语言嵌入是自然语言处理和信息检索中许多任务的单方文字演示中关键词的描述符。 询问的表述是该领域最关键的挑战之一,因为它包含几个术语,而且几乎没有描述能力。 平均的单方文字嵌入等战略可以丰富查询的描述能力,因为它们有利于从这些方法所特有的连续矢量演示中确定相关术语。 我们提出了一个数据驱动战略,将词嵌入合并。 我们使用嵌入的Idf组合来代表查询,表明这些表达方式超过了文献中最近提出的平均嵌入词。 基准数据的实验结果显示,我们的提案表现良好,表明由数据驱动的嵌入词组合是Adhoc信息检索中很有希望的研究线。