多态异构机器学习及其在大数据挖掘中的应用

项目名称： 多态异构机器学习及其在大数据挖掘中的应用

项目编号： No.61473123

项目类型： 面上项目

立项/批准年度： 2015

项目学科： 其他

项目作者： 杨沛

作者单位： 华南理工大学

项目金额： 78万元

中文摘要： 随着大数据挖掘的兴起，许多重要的机器学习应用系统面临着多重异构性和稀缺性并存的挑战，例如：搜索引擎欺诈点击检测、内部恶意行为检测、在线社交媒体分析、半导体芯片生产缺陷检测、大脑图像分析等。异构性包括任务、视图、实例、标签、神谕异构等；稀缺性包括稀缺类、异常点、不平衡性等。而目前的研究大都是针对单一的稀缺性或异构性问题。为此，我们提出了一系列新颖的多重异构性和稀缺性并存的研究问题，并且提出了新颖的模型算法，例如：基于二部图的多视图多任务多示例学习模型、基于三部图的多视图多任务多示例学习模型、基于边界度的多视图多任务学习框架等。以此为突破口，我们希望通过本研究，建立一个统一的多态异构机器学习算法框架，以适用于各种不同的异构性和稀缺性并存的问题。同时，从Rademacher复杂度、泛化误差边界、PAC可学习性等角度，对多态异构机器学习算法进行深入的理论分析，以夯实多态异构机器学习领域的理论基础。

中文关键词： 异构机器学习；稀缺类分析；多任务学习；多视图学习；多示例学习

英文摘要： In the era of big data, the co-existing of multiple types of heterogeneity and rarity is one of the major challenges faced by many highly important real-world machine learning applications, such as click fraud detection, malicious insider detection, online social media analysis, defect detection in semiconductor manufacturing, brain image analysis, etc. Multiple types of heterogeneity include task-, view-, instance-, label-, and oracle-heterogeneity, and rarity could be in terms of rare category, outlier, imbalance, etc. However, most of existing work focus on single type of heterogeneity or rarity. Therefore, we introduce a number of novel problems in which multiple types of heterogeneity and rarity co-exist. Then, various novel models are proposed to effectively leverage both heterogeneity and rarity, such as bipartite-graph-based multi-view multi-task learning framework, triple-graph-based multi-view multi-task learning framework, multi-view multi-task learning model based on border-degree, etc. Furthermore, starting from these typical problems, we plan to build a principled and unified framework to learn from multiple types of heterogeneity and rarity simultaneously. At the same time, theoretic analysis with respect to the Rademacher complexity, generalization error bound, PAC learnability, and etc. are investigated so as to reinforce the theoretical basis for the field of heterogeneous machine learning.

英文关键词： Heterogeneous machine learning;Rare category analysis;Multi-task learning;Multi-view learning;Multi-instance learning

成为VIP会员查看完整内容