项目名称: 面向高维多示例数据的潜在语义分类模型及其实现
项目编号: No.61305061
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 吕艳萍
作者单位: 厦门大学
项目金额: 26万元
中文摘要: 大规模高维多示例数据分类是智能医疗、生物信息学等现代信息工程亟待解决的一个共性问题。本项目从数据潜在语义角度研究高维多示例数据分类的新模型和新方法,以解决传统方法存在的距离测度无效、学习假设不合理等问题,是分类模型研究的语义升华。主要研究内容有:研究大规模示例集的语义提取和表示,以及多示例包有效重构技术;在高维空间中,研究多类潜在语义分类模型,并将距离测度、数学模型和优化策略升华到潜在语义空间中进行;在此基础上,进一步研究预测包中未标记示例类别的半监督策略。采用潜在语义分类模型的优点是它可以同时考虑包之间的整体差异和包的内部差异来重构多示例包,并且在高维空间中提取数据潜在语义特征,使得数据之间既有可比性,又能够比较语义差异。该项目的成功实施,将在语义层面上构建面向高维多示例数据实用且具有普适性的分类模型和搜索算法,对此类数据分类性能的改善将产生积极影响,对分类算法的进一步应用有着重要意义。
中文关键词: 分类;高维数据;多示例学习;潜在语义模型;
英文摘要: Classification large-scale high-dimensional multiple instance data is commonly seen in modern information engineerings, such as smart medicine, bioinformatics and so on. Traditional classification methods have their limitations in dealing with such kind of data, such as ineffective similarity measure, unreasonable learning assumption etc. Using latent semantic information to classify high-dimensional multiple instance data can promote the research on classification model. In this project, we will conduct the following research programs: to investigate semantic extraction and representation from a large scale of instances, as well as reconstruction of bags in multiple instance learning; to investigate multi-class latent semantic classification model in high-dimensional spaces, from which dissimilarity measure, mathematical optimization model and search strategy are upgraded into the latent semantic level; to investigate semi-supervised techniques for prediction of unlabeled instances. The advantage of using a latent semantic multiple instance classification model is that it can reconstruct multiple instance bags by taking into consideration both the inter and intra bag difference. Moreover, it can establish a feature space defined by latent semantic features extracted from high-dimensional data, thus instances ar
英文关键词: Classification;High-dimensional Data;Multiple Instance Learning;Latent Semantic Model;