We study the coarse-grained selection module in the retrieval-based chatbot. Coarse-grained selection is a basic module in a retrieval-based chatbot, which constructs a rough candidate set from the whole database to speed up the interaction with customers. So far, there are two kinds of approaches for coarse-grained selection modules: (1) sparse representation; (2)dense representation. To the best of our knowledge, there is no systematic comparison between these two approaches in retrieval-based chatbots, and which kind of method is better in real scenarios is still an open question. In this paper, we first systematically compare these two methods. Extensive experiment results demonstrate that dense representation method significantly outperforms the sparse representation, but costs more time and storage occupation. In order to overcome these fatal weaknesses of the dense representation method, we also propose an ultra-fast, low-storage, and highly effective Deep Semantic Hashing Coarse-grained selection method, called DSHC model. Specifically, in our proposed DSHC model, a hashing optimizing module that consists of two auto-encoder models is stacked on a well trained dense representation model, and three loss functions are designed to optimize it. The hash codes provided by hashing optimizing module effectively preserve the rich semantic and similarity information in dense vectors. Ex-tensive experiment results prove that our proposed DSHC model can achieve much faster speed and lower storage than sparse representation, with very little performance loss compared with dense representation. Besides, our source codes have been publicly released for future research.
翻译:我们研究了基于检索的聊天室中的粗差选择模块。粗差选择是一个基于检索的聊天室中的基本模块。粗差选择是一个基于检索的聊天室中的基本模块,它从整个数据库中构建了一个粗候选人组,以加快与客户的互动。到目前为止,粗粗差选择模块有两种方法:(1) 代表性稀少;(2) 代表度。据我们所知,在基于检索的聊天室中,这两种方法之间没有系统比较,在真实情况下哪种方法更好仍然是一个尚未解决的问题。在本文中,我们首先系统地比较这两种方法。广泛的实验结果表明,密集代表制方法大大超过分散的代表制,但花费了更多的时间和存储占用。为了克服粗粗粗粗代表制代表制的这些致命弱点,我们还提出了一种超快、低存储率和高效的深制高调的卡萨氏模型,称为DSHC模型。具体地说,在我们提议的源码中,一个由两种深度代表制精细的模型组成的最优化模块,在进行最精确的存储和最精确的存储模式上,一个经过训练的模型是精密的。