As a key task of question answering, question retrieval has attracted much attention from the communities of academia and industry. Previous solutions mainly focus on the translation model, topic model, and deep learning techniques. Distinct from the previous solutions, we propose to construct fine-grained semantic representations of a question by a learned importance score assigned to each keyword, so that we can achieve a fine-grained question matching solution with these semantic representations of different lengths. Accordingly, we propose a multi-view semantic matching model by reusing the important keywords in multiple semantic representations. As a key of constructing fine-grained semantic representations, we are the first to use a cross-task weakly supervised extraction model that applies question-question labelled signals to supervise the keyword extraction process (i.e. to learn the keyword importance). The extraction model integrates the deep semantic representation and lexical matching information with statistical features to estimate the importance of keywords. We conduct extensive experiments on three public datasets and the experimental results show that our proposed model significantly outperforms the state-of-the-art solutions.
翻译:作为回答问题的关键任务,问题检索吸引了学术界和业界的极大关注。先前的解决方案主要侧重于翻译模型、主题模型和深层学习技巧。与以往的解决方案不同,我们提议通过为每个关键词分配的学习重要性分数来构建精细的语义表达式,这样我们就可以实现一个精细的提问匹配解决方案,与这些不同长度的语义表达式相匹配。因此,我们提议了一个多视角语义匹配模型,在多个语义表达式中重新使用重要关键字。作为构建精细精细精密的语义表达式的关键,我们首先使用跨任务、监管不力的提取模型,该模型将问题标注的信号用于监督关键词提取过程(即学习关键词的重要性)。提取模型将深度语义表达和词汇匹配信息与统计特征相结合,以估计关键词的重要性。我们在三个公共数据集上进行了广泛的实验,实验结果显示,我们提议的模型大大超越了状态解决方案。