The change-point detection problem has been widely studied in time series and signal processing literature. The current methods can be resumed in the search for the appropiate partitions of a whole time series such that the problem can be approached as one of optimization; nevertheless, an exact optimization approach could result computationally expensive and approximate ones discard potential scenarios for change-points configurations in a non-rigorous manner. Thus, a framework it is presented to detect change-points in a univariate time series using a decision criterion based on the Minimum Description Length (MDL), modified such that a Bayesian analysis is included. To search for the points of change, the times where mean value deviations occur (exceedances) are analyzed and then it is evaluated which of these could constitute a change-point through a genetic algorithm using as a fitness function the previously described MDL. The effectiveness of the method it is assessed through a simulation study and on the other hand, it is analyzed its practical validity in a real dataset for the presence of Particulate Matter of less than 2.5 microns (PM2.5) in Bogot\'a, Colombia for the 2018-2020 period under different settings to understand the algorithm convergence. It is found that this definition for the objective function tends to find better results for both the number of change-points and their location in the series for most of cases reducing the error in comparison to other available methods in the literature.
翻译:在时间序列和信号处理文献中广泛研究了改变点探测问题。在寻找整个时间序列的适应性分区时,可以恢复目前的方法,这样可以将问题作为优化处理;然而,精确的优化方法可能会在计算上造成昂贵的和大致的情景,以非灵活的方式丢弃变化点配置的潜在假想。因此,它提出的一个框架是使用基于最低描述长度(MDL)的决定标准,在单一时间序列中检测变化点,并经过修改,包括巴伊西亚的分析。为了寻找变化点,对出现平均价值偏差(溢价)的时间进行了分析,然后对其中哪些可能通过基因算法以非灵活的方式产生变化点。它通过模拟研究来评估其有效性,另一方面,它正在分析在实际数据集中检测变化点的实际有效性,以显示在波哥大存在低于2.5微粒(PM2.5)的状态,然后对数值发生平均偏差(溢价)的时间进行了分析,然后对其中哪个部分构成变化点,通过使用先前描述的健身功能,在201818-2020年期间,在不同的设置下,它能够更精确地找到该数值的位置,从而更好地比较在201818-2020年期间找到该序列中找到该序列中最接近的数值。