项目名称: “用户行为数据”稀疏表示的理论与方法
项目编号: No.61273294
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 韩素青
作者单位: 太原师范学院
项目金额: 46万元
中文摘要: 稀疏表示(Sparse Representation)是机器学习研究的一个重要课题,而有用户需求或偏好的"用户行为数据"的分析与处理是近几年来网络服务商提出的主要任务之一。在统计机器学习中,L1正则化是实现数据稀疏表示的主要途径。但是,对于"用户行为数据",如果使用L1正则化方法,势必需要把符号数据不合理地理解为连续数据。事实上,针对具体问题,如果在符号数据集上关于样本能够定义出相应的区分关系,就可以根据数据的内在结构获得特征意义上的稀疏表示,并且获得样本意义上的稀疏表示,但这个问题已不再是L1正则化的任务了。而概率图模型理论在数据的稀疏表示和稀疏数据学习方面有较强的优势,因此,本项目试图借助该理论,基于符号机器学习方法,发展能够处理用户行为数据稀疏化的表示理论与算法,一方面避开不合理的"符号数据实数化",另一方面绕开最小二乘这类比较费时的计算,使稀疏化的过程和结果变得可解释的。
中文关键词: 用户行为数据;面向用户需求的属性约简;多属性决策;聚类分析;
英文摘要: Sparse Representation is one of the significant research topics in machine learning. In recent years, network provider has proposed an important task to analyze and process the data of user behavior, which reflects users' demands and preferences. L1 Regularization is a curcial method to perform Sparse Representation of data in statistical machine learning. However, when analyzing the data of user behavior, Regularzization requires symbolic data to be discrete, which is unreasonable and unnesessary. In fact, for specific problems, as long as cooresponding distinction relationship of samples in the symbolic data set were defined, it is posible to obtain the sparse representaion with regard to the feature significance, based on the internal structure of data. The sparse representation with regard to the sample significance can also be obtained, but not within the consideration of L1 Regularization. This project developed theorital representation and algorithm to process the sparsed data of user behavior, and made the process and results of sparse explainable. In addition, the research results would promote the research and development of this area.
英文关键词: the data of user behavior;user-oriented attribute reduct;multi-attribute decision making;clustering analysis;