项目名称: 基于生态演替的文本大数据特征学习研究
项目编号: No.61502288
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 郭鑫
作者单位: 山西大学
项目金额: 20万元
中文摘要: 随着大数据时代的到来,文本数据的快速增长给信息处理和使用带来巨大的挑战,特征选择是机器学习算法的基础,对算法的准确度和效率有着关键性影响。本项目拟从文本大数据出发,基于生态演替理论研究特征学习高效算法,并实现模型的增量动态学习。主要研究内容包括:1.多粒度文本特征建模;2.针对实时数据的文本特征演替模型研究;3.面向篇章级和句子级自动文本特征学习方法研究;4.基于生态演替的文本特征学习实证研究。本项目的研究将为文本挖掘、信息检索领域的机器学习模型提供坚实的理论基础,同时推动文本特征降维研究的发展,为舆情监测、语义分析提供实际应用价值。
中文关键词: 特征学习;生态演替;特征建模;文本大数据
英文摘要: With the arrival of Big Data, the rapid growth of text data poses great challenges for information processing and utilization, feature selection is the basis of machine learning, which has a crucial impact on the accuracy and efficiency of the algorithm. Starting from text big data, this project intends to develop an efficient feature learning algorithm based on ecological succession theory, and to realize incremental dynamic learning for that model. The main research contents include: 1. Multi-granularity text feature modelling; 2. Real time text feature succession modeling; 3. Document-level and sentence-level oriented automatic text feature learning methods; 4. Text feature learning based on empirical research of ecological succession. The research of this project will provide a solid theoretical foundation for text mining, information retrieval and machine learning models, promote the development of text feature dimension reduction research, and provide practical application value for the public opinion monitoring, semantic analysis.
英文关键词: feature learning;ecological succession;feature modeling;text big data