海量社会媒体数据中不实信息的分析与检测

项目名称： 海量社会媒体数据中不实信息的分析与检测

项目编号： No.61272343

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 张铭

作者单位： 北京大学

项目金额： 84万元

中文摘要： 微博等社会媒体的蓬勃发展，帮助人们更快捷地获取信息。但是，由于任何人都可能成为信息发布者，使得许多不实信息混杂其中，而且传播更加迅速、蛊惑性强且不易甄别。因此，迫切需要一套自动、高效、准确地衡量信息真实度的模型与算法，以检测不实信息。本课题拟采用基于多元采样、事件聚类和半监督标注的分层提炼方法构建面向海量社会媒体的大规模、高质量不实信息数据集；基于上述数据集，从内容、用户和传播等三个方面，以主题模型、机器学习技术、回归分析方法和社会学传播理论为工具全面理解不实信息，获取分析不实信息的基本特征。基于上述特征分析，建立综合"内容－用户－传播"特征的支持向量机回归SVR模型以判断博文信息真实度，建立图模型来共同检测用户和信息真实度度量，最终形成科学的不实信息自动检测方法。在上述理论方法和技术的研究基础之上，本课题还将开发不实信息的在线预警与检测系统，服务于社会媒体的和谐稳定与健康发展

中文关键词： 社会媒体；主题模型；用户分析；争议性分析；谣言检测

英文摘要： Recently, social media sites have achieved an impressive growth rate, and become important tools for people to leverage the wisdom of the crowds. However, since anyone can be the information source, the rapid growth also makes misinformation more indiscriminate and spread among a larger amount of people more quickly. Therefore, it is crucial to design algorithms that detect misinformation automatically and efficiently. The project aims at analyzing and detecting misinformation from large scale data in social media. First, we construct a large-scale misinformation dataset from cross-media data with the following steps: reasonable sampling strategies, temporal event clustering and semi-supervised annotation. Secondly, we conduct a symmetrical analysis towards misinformation from the perspective of content, user and diffusion, with methods such as topic model, machine learning, regression, hypothesis tests and diffusion theory. Finally, we propose two models to detect misinformation automatically: one SVR (Support Vector Regression) model based on the analyzed "content - user - diffusion" features; and one graphical model incorporating the user-information network with the above features. Further, to demonstrate the practicality and feasibility of our study, we design an alert and retrieval system to benefit long-

英文关键词： Social Median；Topic Model；User Analysis；Controversial Analysis；Rumor Detection

成为VIP会员查看完整内容