Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.
翻译:搜索是用来查找信息的最常见的平台之一。 但是, 当用户在使用此平台解决查询时, 大多会因为结果而超载。 如今, 直接回答询问是搜索经验的一部分。 问答( QA) 检索进程在丰富搜索经验方面起着重要作用。 大多数现成的语义文本相似性模型对完善的搜索查询效果良好, 但是在应用特定域设置时, 其功能会降低, 且该域的搜索查询内容不完整或格式化不完善。 在本文中, 我们讨论一个框架, 用来计算某个特定输入查询和一组预定义的问题之间的相似性。 如今, 我们用它来获取最匹配的问题。 但是, 这个框架对于任何特定域的搜索引擎来说都是通用的, 也可以在其它域中使用。 我们使用 Siamese 网络 [6] 超越长期短期内存储( LSTM) [3] 模型来训练一个分类, 生成不正规和标准化的对一对给定问题的支持。 此外, 每一个问题组合, 我们用一个类似的方式, 将一个相似的域名( Rodel2) 的域域域内, 我们用一个相似的排序之间的其它的计算结果, 将一个类似评分。