项目名称: 面向大数据深度分析的马尔科夫逻辑理论与算法研究
项目编号: No.61303179
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 孙正雅
作者单位: 中国科学院自动化研究所
项目金额: 23万元
中文摘要: 马尔可夫逻辑作为一阶逻辑和概率图模型的充分结合,被视为数据深度分析的最重要技术手段之一,然而在该框架下所开发的大多数算法不具备良好的可扩展性。为了提升从大数据中获取知识和洞见的能力,本项目以马尔可夫逻辑为基础理论框架,拟从特征表示、参数优化以及增量学习系统搭建三个方面系统研究大数据深度分析技术。首先针对大数据类型多样化以及关系复杂化,拟借助预测聚类树和频繁序列模式挖掘思想,研究面向关系n元组的分层概念学习,在此基础上提出新颖的结构学习算法实现异质关系的路径搜索以及逻辑规则的自动构建。其次针对大数据规模庞大,拟借助深层和积网络理论,发展新的在线并行优化算法,实现不确定规则参数化学习。最后为了适应新增数据的不断涌现,在特征表示和参数优化中充分结合增量学习思想,搭建应用于大数据深度分析的增量学习系统。实现从大数据中迅速而准确地获取深层次语义信息,有助于促成科学预见性的决策和判断。
中文关键词: 大数据;统计关系学习;马尔可夫逻辑;和积网络;增量学习
英文摘要: Markov logic has been regarded as one of the most important tools for deep data analysis due to its full expressiveness of probabilistic graphical models and first-order logic. However, as we enter the "big data" era, the ever rising scale of the data makes progress in this paradigm increasingly difficult. To enhance the ability to acquire knowledge and insights from big data, this project conducts a systematic study on the framework of Markov logic from three aspects, including feature representation, parameter optimization and incremental learning system building. For the varied types and complex relations of big data, we first develop an effective hierarchical conceptualization algorithm for relational n-tuples by introducing the idea of predictive clustering trees and frequent sequential pattern mining. On this basis, a novel structure learning algorithm is designed to find paths between heterogeneous relations and automatically construct formulas. Furthermore, we introduce deep sum-product networks to address parameter learning for the large scale data, in which new online parallel optimization strategies are devised. Faced with the emergence of massive new data, we finally investigate feature representation and parameter optimization from incremental learning view, and build an integrated system for in-dep
英文关键词: Big Data;Statistical Relational Learning;Markov Logic;Sum-Product Networks;Incremental Learning