Many important real-world applications involve time-series data with skewed distribution. Compared to conventional imbalance learning problems, the classification of imbalanced time-series data is more challenging due to high dimensionality and high inter-variable correlation. This paper proposes a structure preserving Oversampling method to combat the High-dimensional Imbalanced Time-series classification (OHIT). OHIT first leverages a density-ratio based shared nearest neighbor clustering algorithm to capture the modes of minority class in high-dimensional space. It then for each mode applies the shrinkage technique of large-dimensional covariance matrix to obtain accurate and reliable covariance structure. Finally, OHIT generates the structure-preserving synthetic samples based on multivariate Gaussian distribution by using the estimated covariance matrices. Experimental results on several publicly available time-series datasets (including unimodal and multimodal) demonstrate the superiority of OHIT against the state-of-the-art oversampling algorithms in terms of F1, G-mean, and AUC. The code of OHIT is available at github.com/zhutuanfei/OHIT.
翻译:与传统的不平衡学习问题相比,不平衡的时间序列数据分类由于高度的维度和高度的可变性相关关系而更具挑战性。本文件提议了一种结构,以保存用于打击高维平衡时间序列分类(OHIT)的过度抽样方法。OHIT首先利用基于密度的共享近邻群集算法来利用基于密度的共享近邻群集算法来捕捉高维空间中少数群体类的模式。然后,对每种模式都采用大维共变矩阵缩缩缩技术,以获得准确和可靠的变量结构。最后,OHIT利用估计的共变式矩阵生成基于多变量分布的结构保护合成样本。关于若干公开提供的时间序列数据集(包括单式和多式)的实验结果显示OHIT优于F1、G-平均值和AUC等的最新过度抽样算法。OHIT的代码可在 Githhub.com/zhutusanfie/OHIT查阅。