Query expansion is the process of reformulating the original query by adding relevant words. Choosing which terms to add in order to improve the performance of the query expansion methods or to enhance the quality of the retrieved results is an important aspect of any information retrieval system. Adding words that can positively impact the quality of the search query or are informative enough play an important role in returning or gathering relevant documents that cover a certain topic can result in improving the efficiency of the information retrieval system. Typically, query expansion techniques are used to add or substitute words to a given search query to collect relevant data. In this paper, we design and implement a pipeline of automated query expansion. We outline several tools using different methods to expand the query. Our methods depend on targeting emergent events in streaming data over time and finding the hidden topics from targeted documents using probabilistic topic models. We employ Dynamic Eigenvector Centrality to trigger the emergent events, and the Latent Dirichlet Allocation to discover the topics. Also, we use an external data source as a secondary stream to supplement the primary stream with relevant words and expand the query using the words from both primary and secondary streams. An experimental study is performed on Twitter data (primary stream) related to the events that happened during protests in Baltimore in 2015. The quality of the retrieved results was measured using a quality indicator of the streaming data: tweets count, hashtag count, and hashtag clustering.
翻译:查询扩展是通过添加相关字词来重新校正原始查询的过程。 选择要添加哪些词来改进查询扩展方法的性能或提高检索结果的质量,是任何信息检索系统的一个重要方面。 添加能够积极影响搜索查询质量的词或信息足够丰富的词在返回或收集涉及某个主题的相关文件方面发挥重要作用, 从而提高信息检索系统的效率 。 通常, 查询扩展技术用来在特定搜索查询中添加或替换词来收集相关数据 。 在本文中, 我们设计和实施一个自动查询扩展管道。 我们用不同的方法来扩展查询。 我们的方法取决于利用预测性主题模型在流数据流中针对突发事件,并从目标文件中找到隐藏的话题。 我们使用动态 Eigentent Centrent Centrental 来触发突发事件, 以及使用Lenttrichtrilet 分配来发现相关主题。 此外, 我们使用外部数据源作为第二流, 用相关词补充主流, 并用主流和二流的词来扩展查询。 我们用不同的方法来扩展查询。 我们的方法取决于在流中流流中流中流的数据流中, 在2015年的Silbildal 中, 相关数据的实验研究中, 是在一个测量数据质量中, 在2015 上的数据流中, 进行了一个测试数据流中, 数据流中进行了一项实验研究。