面向众包恶意炒作的早期检测技术研究

项目名称： 面向众包恶意炒作的早期检测技术研究

项目编号： No.61303172

项目类型： 青年科学基金项目

立项/批准年度： 2014

项目学科： 自动化技术、计算机技术

项目作者： 田冠华

作者单位： 中国科学院自动化研究所

项目金额： 23万元

中文摘要： 众包恶意炒作行为危害社会、影响正常的网络秩序，需要研究和治理。然而目前国内外对于基于众包的恶意炒作行为的研究尚处于起步阶段，还没有提出有效的针对众包恶意炒作行为的检测方法。本项目以众包恶意炒作行为的早期检测技术为研究目标，通过结合自然语言理解及统计机器学习的研究成果，提出针对网络水军发帖的自动检测框架，为研究解决众包恶意炒作行为的早期检测提供解决方法和途径。研究内容包括：建立众包恶意炒作行为的多特征描述和生成方法，从而将众包恶意炒作行为的检测问题转化成一个模式分类的可计算问题；研究基于深层语义的短文本相似度相关度计算方法，是机器能从语义层面理解帖子内容并检测恶意炒作行为的重要基础；建立融合众包运作信息的测度学习模型和检测方法，以解决大样本多模态分布学习的难题。通过建立众包舆情的早期检测平台对本项目提出的特征描述、测度学习模型进行实践验证，探索解决众包恶意炒作行为的早期检测方法。

中文关键词： 众包；舆情检测；深度语义分析；特征描述；深度学习

英文摘要： Crowd sourcing astroturfing is very harmful to the society and has negative effects to the normal order of the network, which need the research and management. However, the research on this, is still in the very beginning stages both at home and abroad. This field needs effective theory and related research towards the detection of the paid posters. This project aims to propose a detection framework for Crowd Sourcing astroturfing by combining natural language processing and statistical machine learning. The research content of our project includes: establishing the action description and feature generation algorithm for Crowd Sourcing astroturfing, which transforms the detection problem to computational pattern classification model; researching algorithms of short message's similarity and correlation computation, for assisting the machine understanding post's meaning and providing an important evidence for the detection of Crowd Sourcing astroturfing; constructing metric learning models and detection models with grouping information of Crowd Sourcing, to solve the learning problem on large-scale multi-model distribution samples. The construction of Crowd sourcing detection platform is aimed to verify the proposed feature description, metric learning model and solve the detection problem fundamentally.

英文关键词： Crowdsourcing；Astroturf detection；deep semantic processing；feature description；deep learning

成为VIP会员查看完整内容