Markov chains with variable length are useful parsimonious stochastic models able to generate most stationary sequence of discrete symbols. The idea is to identify the suffixes of the past, called contexts, that are relevant to predict the future symbol. Sometimes a single state is a context, and looking at the past and finding this specific state makes the further past irrelevant. These states are called renewal states and they split the chain into independent blocks. In order to identify renewal states for chains with variable length, we propose the use of Intrinsic Bayes Factor to evaluate the plausibility of each set of renewal states. In this case, the difficulty lies in finding the marginal posterior distribution for the random context trees for general prior distribution on the space of context trees and Dirichlet prior for the transition probabilities. To show the strength of our method, we analyzed artificial datasets generated from two binary models models and one example coming from the field of Linguistics.
翻译:具有不同长度的 Markov 链条是有用的随机随机模型, 能够生成最固定的离散符号序列。 想法是确定与预测未来符号相关的过去( 称为背景) 的后缀。 有时, 单个状态是一个上下文, 审视过去并发现这一特定状态使过去更不相干。 这些状态被称为更新状态, 并将链条分割成独立的区块。 为了确定具有不同长度的链条的更新状态, 我们建议使用 Intrinsic Bayes 系数来评估每个更新状态的可容性。 在此情况下, 困难在于找到随机上下文树的边缘外观分布, 以在上下文树和 Dirichlet 的面积上进行一般的先前分布, 以过渡概率为先。 为了显示我们的方法的强度, 我们分析了两个二进制模型生成的人工数据集, 以及语言领域生成的一个例子 。