We propose a novel sequence prediction method for sequential data capturing node traversals in graphs. Our method builds on a statistical modelling framework that combines multiple higher-order network models into a single multi-order model. We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order. Our framework facilitates both next-element and full sequence prediction given a sequence-prefix of any length. We evaluate our model based on six empirical data sets containing sequences from website navigation as well as public transport systems. The results show that our method out-performs state-of-the-art algorithms for next-element prediction. We further demonstrate the accuracy of our method during out-of-sample sequence prediction and validate that our method can scale to data sets with millions of sequences.
翻译:我们为从图表中获取节线穿行的顺序数据提出了一个新的序列预测方法。我们的方法基于一个统计建模框架,将多个高阶网络模型合并成一个单一的多阶模型。我们开发了一种技术,将这种多阶模型纳入实证顺序数据,并选择了最佳最大顺序。我们的框架为下一个元素和完整的序列预测提供了便利,并给出了任何长度的序列前缀。我们根据包含网站导航和公共交通系统序列的6个经验数据集评估了我们的模型。结果显示,我们的方法优于下一个元素预测的最新算法。我们进一步展示了我们方法在外序列预测中的准确性,并验证了我们的方法能够以数百万序列的序列来测量数据集。