在大功能数据中检测外星数据并将其分类 (Detecting and Classifying Outliers in Big Functional Data)

We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD detects and classifies outliers by computing for each curve, three indices, all based on the concept of linear regression and correlation, which measure outlyingness in terms of shape, magnitude and amplitude, relative to the other curves in the data. 'Semifast-MUOD', the first method, uses a sample of the observations in computing the indices, while 'Fast-MUOD', the second method, uses the point-wise or L1 median in computing the indices. The classical boxplot is used to separate the indices of the outliers from those of the typical observations. Performance evaluation of the proposed methods using simulated data show significant improvements compared to MUOD, both in outlier detection and computational time. We show that Fast-MUOD is especially well suited to handling big and dense functional datasets with very small computational time compared to other methods. Further comparisons with some recent outlier detection methods for functional data also show superior or comparable outlier detection accuracy of the proposed methods. We apply the proposed methods on weather, population growth, and video data.

翻译：我们建议了两种新的外差探测方法,用以在(大)功能数据集中查明不同类型的外差并进行分类。建议的方法基于一种现有方法,即“大规模无监督外差探测”(MUOD),MUOD通过计算每个曲线来探测外差并分类,三个指数,所有这些都基于线性回归和相关性概念,衡量与数据中其他曲线相比在形状、规模和振幅方面的外差。“Semifast-MUOD”是第一种方法,在计算指数时使用观测样本,而“Fast-MUOD”是第二种方法,在计算指数时使用点或L1中值中值。古典框图用于将外差指数与典型观测的指数分开。使用模拟数据对拟议方法的绩效评估显示,与MUOD相比,在外向外检测和计算时间,与数据的其他曲线上,我们表明,快速MUOD特别适合以极小的计算时间处理大型和稠密的功能数据集,而与其他方法相比,“Fast-MUD”,第二个方法,在计算指数或L1中,在计算指数计算中,在计算指数计算指数中采用中采用中点或L1中中中中中中点值中,在计算中,还使用了某些功能性测算出最近测算数据的方法。进一步比较,还显示一些功能性测算出一些测算数据。