项目名称: 基于多源模板重构的社交网络垃圾信息在线检测方法研究
项目编号: No.61472359
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 何钦铭
作者单位: 浙江大学
项目金额: 80万元
中文摘要: 社交网络中的垃圾信息问题日益严重,为逃避检测,垃圾信息发送者越来越多地使用复杂多变的模板生成垃圾信息,其无固定公共子串、包含噪音、多源混合以及部分借助正常用户发送的特点,使得现有方法无法有效在线检测。本项目以社交网络中无固定公共子串模板生成的垃圾信息为研究对象,旨在探索针对模板生成的垃圾信息实施在线检测的方法,从仅含部分样本且多源混合的实时信息流中自动区分并提取垃圾信息所用的不同模板进行重构,从而准确反映垃圾信息的本质特征,实现垃圾信息在线检测,包括0-day垃圾信息。拟研究多数融合及矩阵变换的模板重构、基于稀有类挖掘的在线增量聚类、应用序列标注的噪音识别及基于社区挖掘的垃圾信息账号同源分析等启发式算法和方法,解决多源数据在线区分、模板重构及噪音与错误数据消减等关键科学问题,保证方法的实时、准确及自适应。研究成果将直接指导构建社交网络垃圾信息的在线检测系统,保障用户安全和社交网络正常运行。
中文关键词: 在线社交网络;垃圾信息;模板重构;增量聚类;社区检测
英文摘要: Spam campaign activities in online social networks are increasing. Most spam campaigns use complicated template to generate spam content in order to avoid detection, which are absence of invariant substring, prevalence of noise and heterogeneity. Many of them are sent via normal accounts. All these challenges to existing spam detection work. This proposal focuses on the spam generated by template without invariant substrings and online detection approach. By automated multiple spam template reconstruction from online messages flow which contain part of whole spam set, spam can be detected online with efficiency and accuracy. We proposed a template reconstruction algorithm by majority merge and matrix transformation, online incremental clustering based on rare category mining, noise identification by sequence labeling and spam account source analysis based on community detection. Those researches should solve the key problems in online social network spam detection, including non-invariant substrings template reconstruction, online clustering of messages generated by multiple template, noise identification and error toleration, provide an online, accurate and adaptive detection. The research will support building online and effective spam detection and filter system in online social networks, which threat the security of users in them.
英文关键词: Online Social Network;SPAM;Template Reconstruction;Incremental Clustering;Community Detection