Both clustering and outlier detection play an important role for meteorological measurements. We present the AWT algorithm, a clustering algorithm for time series data that also performs implicit outlier detection during the clustering. AWT integrates ideas of several well-known K-Means clustering algorithms. It chooses the number of clusters automatically based on a user-defined threshold parameter, and it can be used for heterogeneous meteorological input data as well as for data sets that exceed the available memory size. We apply AWT to crowd sourced 2-m temperature data with an hourly resolution from the city of Vienna to detect outliers and to investigate if the final clusters show general similarities and similarities with urban land-use characteristics. It is shown that both the outlier detection and the implicit mapping to land-use characteristic is possible with AWT which opens new possible fields of application, specifically in the rapidly evolving field of urban climate and urban weather.
翻译:在气象测量方面,我们提出了AWT算法,即时间序列数据的群集算法,这种算法在群集期间也暗含超值检测。AWT综合了几个众所周知的K-Means群集算法的想法。它根据用户定义的临界参数自动选择群集的数量,可用于多种气象输入数据和超过现有内存大小的数据集。我们用AWT对来自维也纳市的群集源2米温度数据采用小时解析法,以探测离子,并调查最终群集是否显示与城市土地使用特征的一般相似和相似之处。它表明,与AWT一起可以对土地使用特征进行外部探测和暗含绘图,从而开辟新的可能应用领域,特别是在迅速变化的城市气候和城市天气领域。