Markov chains with variable length are useful parsimonious stochastic models able to generate most stationary sequence of discrete symbols. The idea is to identify the suffixes of the past, called contexts, that are relevant to predict the future symbol. Sometimes a single state is a context, and looking at the past and finding this specific state makes the further past irrelevant. States with such property are called renewal states and they can be used to split the chain into independent and identically distributed blocks. In order to identify renewal states for chains with variable length, we propose the use of Intrinsic Bayes Factor to evaluate the hypothesis that some particular state is a renewal state. In this case, the difficulty lies in integrating the marginal posterior distribution for the random context trees for general prior distribution on the space of context trees, with Dirichlet prior for the transition probabilities, and Monte Carlo methods are applied. To show the strength of our method, we analyzed artificial datasets generated from different binary models models and one example coming from the field of Linguistics.
翻译:具有不同长度的 Markov 链条是有用的, 能够生成最固定的离散符号序列 。 想法是确定与预测未来符号相关的过去( 称为背景) 的后缀。 有时, 单个状态是一个背景, 审视过去并发现这一特定状态使过去更不相干。 有这种属性的国家被称为更新状态, 可以用来将链条分割成独立和相同分布的区块。 为了确定具有不同长度的链条的更新状态, 我们提议使用 Intrinsic Bayes 系数来评估某些特定状态是一个更新状态的假设。 在此情况下, 困难在于整合上下文树的边缘树边际外缘分布, 以便事先在上对上下文树进行总体分布, 而在过渡概率之前, 迪里赫莱特 使用蒙特卡洛 方法 。 为了显示我们方法的强度, 我们分析了不同二进制模型生成的人工数据集, 以及语言领域的一个例子。