This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structure based, and vector-based are the most widely used approaches to find sentences similarity. Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences structures needs more investigation.
翻译:研究的目的是审查用于衡量判决相似性的方法。测量自然语言判决之间的相似性是许多自然语言处理应用的关键任务,例如文本分类、信息检索、问答和图象检测。本调查将基于采用的方法计算判决相似性的方法分为三类。基于字对字、基于结构和矢量的计算方法是最常用的查找判决相似性的方法。每种方法都根据具体角度衡量短文本之间的关联性。此外,还采用了主要用作评价该领域技术基准的数据集,以提供对这一问题的完整看法。将不止一种观点结合起来的方法产生了更好的结果。此外,基于类似性的结构衡量判决结构之间的相似性需要更多调查。