应用转让学习来利用查询与问题相似性来改进特定搜索经验 (Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity)

from arxiv, 8 pages, accepted in the Proceedings of the 3rd International Conference on Algorithms, Computing and Artificial Intelligence (ACAI), 2020

Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.

翻译：搜索是用来查找信息的最常见的平台之一。但是, 当用户在使用此平台解决查询时, 大多会因为结果而超载。如今, 直接回答询问是搜索经验的一部分。问答( QA) 检索进程在丰富搜索经验方面起着重要作用。大多数现成的语义文本相似性模型对完善的搜索查询效果良好, 但是在应用特定域设置时, 其功能会降低, 且该域的搜索查询内容不完整或格式化不完善。在本文中, 我们讨论一个框架, 用来计算某个特定输入查询和一组预定义的问题之间的相似性。如今, 我们用它来获取最匹配的问题。但是, 这个框架对于任何特定域的搜索引擎来说都是通用的, 也可以在其它域中使用。我们使用 Siamese 网络 [6] 超越长期短期内存储( LSTM) [3] 模型来训练一个分类, 生成不正规和标准化的对一对给定问题的支持。此外, 每一个问题组合, 我们用一个类似的方式, 将一个相似的域名( Rodel2) 的域域域内, 我们用一个相似的排序之间的其它的计算结果, 将一个类似评分。