项目名称: 多元时间序列数据挖掘中的特征表示和相似性度量方法研究
项目编号: No.61300139
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 李海林
作者单位: 华侨大学
项目金额: 23万元
中文摘要: 多元时间序列是数据挖掘领域中主要研究的数据对象之一,而特征表示和相似性度量是多元时间序列数据挖掘任务中基础又重要的工作,其质量优劣直接影响后期挖掘算法的性能和效果。首先,本课题拟采用主成分分析和独立成分分析对多元时间序列数据实现特征提取和相似性度量,解决维灾和信息冗余带来特征表示和相似性度量不准确的问题,从整体和局部视角剖析多元时间序列数据内部之间的关系。其次,为了深化和拓展符号表示方法在多元时间序列数据挖掘中的应用性能,探讨如何从时间弯曲和非时间弯曲来有效地解决多元时间序列向一元时间序列转换、构建合适的符号转化模型和相似性度量方法等问题。此外,为避免度量方法在相似性搜索应用中发生漏报情况,探讨如何结合相应的特征表示方法构造满足真实距离下界性的度量函数。通过多元时间序列仿真数据和实际金融数据分析为背景验证研究成果的可行性,为多元时序数据挖掘中的特征表示和相似性度量研究提供了新的理论和方法。
中文关键词: 多元时间序列;数据挖掘;特征表示;相似性度量;动态时间弯曲
英文摘要: Multivariate time series is one of the most primary research objects in the field of data mining, while feature representation and similarity measure for time series are the basic and important work in the tasks of multivariate time series data mining. Their qualities often directly impact the performance and effects of the algorithms used in time series data mining. Firstly, the project adopts principal component analysis and independent component analysis to realize the feature extraction and similarity measure for multivariate time series, which address the issues on the inaccuracy of feature representation and similarity measure caused by curse of dimensionality and information redundancy. The relationships existed in multivariate time series data are analyzed from the whole and the part. Secondly, to deepen and expand the application performance of symbolic representation methods in multivariate time series, some problems, such as how to transform multivariate time series into univariate time series and how to construct the symbolization model and similarity measure methods, are to be resolved respectively by the time warping method and non-time warping method. In addition, we also discuss how to combine the corresponding feature representation to design similarity measure functions which satisfy lower boun
英文关键词: Multivariate time series;Data Mining;Feature representation;Similarity measure;Dynamic time warping