基于多维度文本特征的社区问答答案质量评估研究

项目名称： 基于多维度文本特征的社区问答答案质量评估研究

项目编号： No.61305089

项目类型： 青年科学基金项目

立项/批准年度： 2014

项目学科： 自动化技术、计算机技术

项目作者： 苏祺

作者单位： 北京大学

项目金额： 24万元

中文摘要： 随着Web2.0的发展，社会化媒体成为互联网的主流应用之一。由于用户在社会化媒体中发布信息并不带有传统媒体那样严格的审查机制，就导致了信息的质量问题日益突出。识别社会化媒体中用户生成的高质量内容为自然语言处理、文本挖掘技术提出了严峻挑战。本项目以社会化媒体中的一个典型应用，即社区问答(cQA)为例，提出了一个基于多维度特征的文本质量评估框架。不同于以往研究中主要从以"非文本特征"建模的用户权威性入手来推测用户所提供的答案文本质量，本项目利用"多维度的文本内容特征"对社区问答中的答案质量进行评估。重点研究(1)多维度评估框架的构建；(2)不同维度上文本特征的抽取与排序学习，特别是"可信性"语义范畴的文本表示；(3)各维度评估因素的有效集成；以及(4)结合答案质量评估改进社区问答检索排序效果。以上研究成果一方面可以直接提高社区回答应用的实用效果；另一方面也能够对文本质量评估研究产生重要影响。

中文关键词： 文本质量；谎言；社会化媒体；言据性；

英文摘要： With the development of Web2.0 technology, social media has been one of the mainstream applications on the Web. Since everybody can publish contents on social media platforms freely, the quality of those user-generated contents becomes a big concern. The task of identifying high-quality content, accordingly, has become a challenging research topic for natural language processing and text mining. In this project, we work on a typical social media application, i.e. community question answering (cQA). An effective strategy based on multi-dimensional textual features is proposed for the detection of cQA answers' quality. Different from the existing approach that predicting answer quality from the authority of users, which could be modeled by non-textual features, we propose to extract and utilize "multi-dimensional textual features". Accordingly, the main focuses of the project include: 1) how do we construct a reasonable multi-dimensional framework for the evaluation of answer quality; 2) how do we extract textual features which contribute to answer quality on each dimension, especially on the semantic categories of "trustworthiness"; 3) how do we score the quality of social media answers by ensembling the evaluation on each dimension; and 4) how do we combine the score of answer quality into a probability graph mo

英文关键词： text quality；deception；social media；evidentiality；

成为VIP会员查看完整内容