Solar activity is an important driver of long-term climate trends and must be accounted for in climate models. Unfortunately, direct measurements of this quantity over long periods do not exist. The only observation related to solar activity whose records reach back to the seventeenth century are sunspots. Surprisingly, determining the number of sunspots consistently over time has remained until today a challenging statistical problem. It arises from the need of consolidating data from multiple observing stations around the world in a context of low signal-to-noise ratios, non-stationarity, missing data, non-standard distributions and many kinds of errors. The data from some stations experience therefore severe and various deviations over time. In this paper, we propose the first systematic and thorough statistical approach for monitoring these complex and important series. It consists of three steps essential for successful treatment of the data: smoothing on multiple timescales, monitoring using block bootstrap calibrated CUSUM charts and classifying of out-of-control situations by support vector techniques. This approach allows us to detect a wide range of anomalies (such as sudden jumps or more progressive drifts), unseen in previous analyses. It helps us to identify the causes of major deviations, which are often observer or equipment related. Their detection and identification will contribute to improve future observations. Their elimination or correction in past data will lead to a more precise reconstruction of the world reference index for solar activity: the International Sunspot Number.
翻译:太阳活动是长期气候趋势的重要驱动因素,必须在气候模型中加以说明。不幸的是,长期直接测量这一数量并不存在。唯一与太阳活动有关的观测,其记录可追溯到17世纪的太阳活动只有太阳点。令人惊讶的是,确定太阳点的数量在一段时间内一直存在一个具有挑战性的统计问题。这是因为需要结合来自世界各地多个观测站的数据,以信号到噪音比率低、非静态性、数据缺失、非标准分布和许多错误的形式进行整合。因此,一些观测站的数据在一段时间内存在严重和各种偏差。在本文件中,我们提出了监测这些复杂和重要系列的第一个系统和彻底的统计方法。它包括成功处理数据的三个必要步骤:在多个时间尺度上平稳,使用校准的CUSUM海图进行监测,以及使用支持矢量技术对失控情况进行分类。这一方法使我们能够在以前的分析中发现一系列广泛的异常(如突然跳跃或更进步的漂移),因此,在以往的分析中,我们提出了监测这些复杂和彻底的统计方法。它有助于我们查明其过去的主要偏离原因。它有助于查明未来的观测结果。