Finite order Markov models are theoretically well-studied models for dependent categorical data. Despite their generality, application in empirical work when the order is larger than one is quite rare. Practitioners avoid using higher order Markov models because (1) the number of parameters grow exponentially with the order, (2) the interpretation is often difficult. Mixture of transition distribution models (MTD) were introduced to overcome both limitations. MTD represent higher order Markov models as a convex mixture of single step Markov chains, reducing the number of parameters and increasing the interpretability. Nevertheless, in practice, estimation of MTD models with large orders are still limited because of curse of dimensionality and high algorithm complexity. Here, we prove that if only few lags are relevant we can consistently and efficiently recover the lags and estimate the transition probabilities of high order MTD models. The key innovation is a recursive procedure for the selection of the relevant lags of the model. Our results are based on (1) a new structural result of the MTD and (2) an improved martingale concentration inequality. Our theoretical results are illustrated through simulations.
翻译:马尔科夫模型在理论上是依赖性绝对数据的良好模型。尽管这些模型在理论上是经过很好研究的模型,但在一般情况下,当该序列大于一个序列时在实证工作中的应用相当少见。从业者避免使用更高顺序的马尔科夫模型,因为(1) 参数数随顺序成倍增长,(2) 解释往往很困难。为了克服这两种限制,采用了过渡性分布模型的混合体(MTD) 。MTD代表了较高顺序的Markov模型,作为单步马科夫链的组合体,减少了参数数量,增加了可解释性。然而,在实践上,对大型订单的MTD模型的估算仍然有限,因为对维度的诅咒和高算法复杂性。在这里,我们证明,如果只有很少的滞后点是相关的,我们可以持续和有效地恢复滞后,并估计高顺序MTD模型的过渡概率。关键创新是选择模型相关滞后的循环程序。我们的结果基于(1) MTD的新结构结果和(2) 改进的马丁格尔浓度不平等。我们的理论结果通过模拟来说明。