项目名称: 面向多示例数据标注的隐变量支持向量机研究
项目编号: No.61202269
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 温雯
作者单位: 广东工业大学
项目金额: 23万元
中文摘要: 真实世界中的对象往往由多个模式组成,不同模式组合对应着不同的高层语义。正是这种多对多的映射关系造成了语义理解和标注的根本困难。现有的一类有效方法是将语义标注转换成多示例学习问题加以处理。然而,这一方法面临着两个重要科学问题尚未解决:现有技术只考虑模式组合对标注的影响,未考虑模式结构的影响,导致数据中关键信息未能充分利用;现有技术只适用于中小样本集的学习和标注,未解决大规模数据标注问题。以上问题严重制约了该方法的实际成效。为此,课题将开展以下研究:研究模式结构对数据标注的统计学影响,在此基础上探索语义数据的多尺度表达及核构建方法,解决模式结构描述和差异性度量问题;研究隐变量支持向量机模型的自适应求解算法,解决大规模多示例数据标注的近似学习问题。项目旨在揭示模式结构影响数据标注的本质规律,提出针对多示例数据标注关键难题的解决方案,为该方法在图像理解及主题抽取等领域的应用奠定理论基础和技术基础。
中文关键词: 多示例学习;模式识别;机器学习;情感分析;文本挖掘
英文摘要: It is common that an object in real world consists of multiple patterns. Different patterns are related to different semantics, which make it inherently difficult for semantic understanding and labeling. An available approach to solve this problem is to transform it into a multiple instance learning procedure. However, there are two fundamental scientific problems that remain unsolved for this method. First, since it is difficult to describe patterns, existing techniques only focus on the influence of pattern sets on the labels, without considering influence of the pattern structure. This makes it impossible to use the structure information. Second, for the complexity of computation, existing techniques are just suitable for learning and labeling small data sets, but not suitable for large-scale data. Motivated by the above two scientific problems, there are two main themes within the proposed research: (1) Based on the statistical analysis of the impact of pattern structure on the labelling, find appropriate multi-scale kernel construction method to describe the pattern structure and build structure variance metric function. (2) Design adaptive algorithms for the approximate learning of large-scale multi-instance data labeling model based on latent support vector machines. Expected outcomes of the proposed rese
英文关键词: multi-instance learning;pattern recognition;machine learning;sentimental analysis;text mining