项目名称: 基于概率图模型的海量评分数据分析与用户行为建模
项目编号: No.61472345
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 岳昆
作者单位: 云南大学
项目金额: 80万元
中文摘要: 随着Web2.0技术的迅速发展,用户产生的数据急剧增长,电子商务和社会网应用中的用户评分数据富含了用户的行为信息,为用户行为建模的研究提供了依据。海量的用户评分数据分析和行为建模关键技术,是用户行为分析和预测、也是数据密集型计算在社会数据分析方面亟待解决的问题。本项目从海量的用户评分数据出发,用隐变量刻画用户的行为,以带隐变量的贝叶斯网(隐变量模型)作为描述用户行为的理论基础、不确定性知识表示和推理的基本框架,以MapReduce作为海量数据处理的技术手段,重点研究描述用户行为的时序隐变量模型构建、分布式存储、增量修改,以及面向评分预测和异常行为检测等用户行为分析典型应用的概率推理查询处理方法,对新方法进行理论分析和实验测试,并开发相应的软件系统。研究成果将为用户行为数据分析提供有效的支撑技术,为动态演变的用户行为建模提供新的思路,具有重要的理论意义和应用价值。
中文关键词: 海量评分数据;用户行为建模;概率图模型;数据密集型计算;概率推理
英文摘要: With the rapid development of Web2.0, user-generated data are increased rapidly. User rating data in e-commerce and social network applications include user behaviors and provide the basis for user behavior modeling. The underlying techniques for analyzing massive user rating data and modeling user behaviors is the critical problem of user behavior analysis and prediction, as well as that of data-intensive based social data analysis. In this project, we start from the massive user rating data, and describe user behaviors by a latent variable. We adopt the BN with latent variables (latent variable model) as the theoretical basis for describing user behaviors, and the basic framework for representing and inferring uncertain knowledge. Adopting MapReduce as the technical means for processing massive data, we focus on the construction, distributed storage and incremental revision of the time-series latent variable model used to describe user behaviors, as well as the inference query processing oriented to the classical applications of user behavior analysis, such as rating prediction and abnormal behavior detection. We further make theoretical analysis and empirical tests on the proposed methods and develop the corresponding software system. The research findings of this project will provide effective techniques for user behavior data analysis, and novel ideas for modeling the evolving user behaviors, which is valuable in both theoretical and practical perspectives.
英文关键词: Massive rating data;User behavior modeling;Probabilistic graphical model;Data-intensive computing;Probabilistic inference