项目名称: 面向异构信息网络中实体归类的模糊聚类
项目编号: No.61502420
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 梅建萍
作者单位: 浙江工业大学
项目金额: 20万元
中文摘要: 相互关联的实体形成的信息网络是一种广泛存在的数据表示形式。对网络中的实体进行聚类是理解和分析信息网络内在结构的一个基本途径,也是进一步数据处理的关键准备步骤。然而异构性增加了信息网络结构的复杂性及理解方式的多面性,这使得面向该种数据的聚类分析成为一个挑战性研究课题。本项目以对异构信息网络中的不同类型实体同时聚类为目标,拟围绕数据、用户、以及这两者与聚类的统一结合来进行以下研究:(1)提出混合两维关系和三维关系的表示模型,旨在对异构信息网络进行准确描述的同时保持合适的模型复杂度;(2)结合模糊聚类理论,研究基于多个异构关系对不同类型实体协同聚类的方法和模型;(3)加入用户在聚类中的主动性,研究不同形式的用户引导及与聚类模型的有效结合。通过以上研究,拟创造新的关系型数据聚类理论和用户指导下的聚类方法,开发有效并切实可行的面向异构信息网络中实体的聚类算法,为异构信息网络分析提供重要技术支持。
中文关键词: 机器学习与数据挖掘;聚类方法;异构信息网络;关系型数据聚类
英文摘要: Information network formed by associated entities is a very important representation form of data which exists in various real-world problems. Clustering of entities involved in the information network is a fundamental way for understanding the network’s underlying structure, and also a critical preparing step for further data processing. Nevertheless, heterogeneity increases the structural complexity of the information network and also the variety of ways for data understanding. This makes clustering analysis of such kind of data a challenging research topic. In this proposal, we target the problem of clustering all types of entities in the network simultaneously by studying from the data, the user and their integration with the clustering model. Specifically, we focus on the following research problems:(1)A hybrid pairwise and three-wise relation representation model that describes the heterogeneous information network with improved model capacity but maintain feasible complexity. (2) Methods and models of collaborative clustering of multi-typed entities in the network based on fuzzy clustering framework.(3)Incorporation of user into clustering by studying different forms of user guidance and their integration with the fuzzy clustering model. With these efforts, we aim to propose new methods of relational data clustering and user-guided clustering, and develop novel, effective and feasible clustering models and algorithms for heterogeneous relational data, which provide useful tools for analysis and understanding of heterogeneous information network.
英文关键词: machine learning and data mining;clustering;heterogenous information network; relational data clustering