Time Series Classification (TSC) has drawn a lot of attention in literature because of its broad range of applications for different domains, such as medical data mining, weather forecasting. Although TSC algorithms are designed for balanced datasets, most real-life time series datasets are imbalanced. The Skewed distribution is a problem for time series classification both in distance-based and feature-based algorithms under the condition of poor class separability. To address the imbalance problem, both sampling-based and algorithmic approaches are used in this paper. Different methods significantly improve time series classification's performance on imbalanced datasets. Despite having a high imbalance ratio, the result showed that F score could be as high as 97.6% for the simulated TwoPatterns Dataset.
翻译:时间序列分类(TSC)在文献中引起许多注意,因为它在医学数据挖掘、天气预报等不同领域应用范围广泛,因此在文献中引起许多注意。尽管海训系统算法是为平衡的数据集设计的,但大多数实际寿命时间序列数据集是不平衡的。偏斜分布在基于远程和基于地物的算法中对于在低等级分离条件下的时间序列分类是一个问题。为了解决不平衡问题,本文件采用了基于取样和算法的方法。不同的方法大大改善了时间序列分类在不平衡数据集上的性能。尽管存在高度的不平衡比率,但结果显示模拟双粒子数据集的F分可能高达97.6%。