在大功能数据中检测外星数据并将其分类 (Detecting and Classifying Outliers in Big Functional Data)

We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD detects and classifies outliers by computing for each curve, three indices, all based on the concept of linear regression and correlation, which measure outlyingness in terms of shape, magnitude and amplitude, relative to the other curves in the data. 'Semifast-MUOD', the first method, uses a sample of the observations in computing the indices, while 'Fast-MUOD', the second method, uses the point-wise or $L_1$ median in computing the indices. The classical boxplot is used to separate the indices of the outliers from those of the typical observations. Performance evaluation of the proposed methods using simulated data show significant improvements compared to MUOD, both in outlier detection and computational time. We show that Fast-MUOD is especially well suited to handling big and dense functional datasets with very small computational time compared to other methods. Further comparisons with some recent outlier detection methods for functional data also show superior or comparable outlier detection accuracy of the proposed methods. We apply the proposed methods on weather, population growth, and video data.

翻译：我们建议了两种新的外差探测方法,用以在(大)功能数据集中查明不同类型的外差并进行分类。建议的方法基于一种现有方法,即大规模无监督外差探测(MUOD),MUOD通过计算每个曲线来探测外差并分类,三个指数,所有这些都基于线性回归和相关性概念,衡量在形状、规模和振幅方面与数据中其他曲线相比的外差。第一个方法“Semifafast-MUOD”,在计算指数时使用观测样本,而第二个方法“Fast-MUOD”在计算指数时使用点或1美元中位值。古典框图用于将外差指数与典型观测的指数分开。使用模拟数据对拟议方法的绩效评估显示,与数据元值比,在外部检测和计算时间中,都明显改进。我们表明,快速多解算方法特别适合处理大和密集功能数据集,而计算时间非常小,第二个方法是在计算指数时使用点或1美元中中中位中位中位值中位值中位值。古框图用于计算。我们提出的天气探测方法,进一步比较。我们提出的功能性测算出最新数据的方法。还显示,还显示,用一些功能性增长方法。