This paper proposes methods of predicting dynamic time series (including non-stationary ones) based on a linguistic approach, namely, the study of occurrences and repetition of so-called N-grams. This approach is used in computational linguistics to create statistical translators, detect plagiarism and duplicate documents. However, the scope of application can be extended beyond linguistics by taking into account the correlations of sequences of stable word combinations, as well as trends. The proposed methods do not require a preliminary study and determination of the characteristics of time series or complex tuning of the input parameters of the forecasting model. They allow, with a high level of automation, to carry out short-term and medium-term forecasts of time series, characterized by trends and cyclicality, in particular, series of publication dynamics in content monitoring systems. Also, the proposed methods can be used to predict the values of the parameters of a large complex system with the aim of monitoring its state, when the number of such parameters is significant, and therefore a high level of automation of the forecasting process is desirable. A significant advantage of the approach is the absence of requirements for time series stationarity and a small number of tuning parameters. Further research may focus on the study of various criteria for the similarity of time series fragments, the use of nonlinear similarity criteria, the search for ways to automatically determine the rational step of quantization of the time series.
翻译:本文提议了根据语言方法预测动态时间序列(包括非静止时间序列)的方法,即研究所谓的N克的发生和重复情况,这种方法用于计算语言,以创建统计翻译员、检测图象和重复文件;然而,应用范围可以扩大到语言以外,为此要考虑到稳定字组合序列的相互关系以及趋势;提议的方法不需要初步研究和确定时间序列的特点或预测模型输入参数的复杂调整;在高度自动化的情况下,可以对时间序列进行短期和中期预测,其特点是以趋势和周期性为特征,特别是内容监测系统中的一系列出版物动态为特征;此外,提议的方法可以用来预测大型复杂系统参数的价值,目的是监测其状况,而这种参数的数量很大,因此预测过程的高度自动化是可取的;这一方法的一个显著优势是,没有时间序列要求时间序列的时间序列和时间序列的短期预测,没有自动确定各种参数的类似方法;此外,还可以利用拟议方法预测大型复杂系统参数的参数值,以监测其状况,在此类参数数量巨大,因此预测过程的高度自动化程度是可取的。