We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by a $k$-ary symmetric noise channel with unknown probability of error. For Ising models (support size = 2), past work has shown that graph structure can only be recovered up to the leaf clusters (a leaf node, its parent, and its siblings form a leaf cluster) and exact recovery is impossible. No prior work has addressed the setting of support size of 3 or more, and indeed this setting is far richer. As we show, when the support size is 3 or more, the structure of the leaf clusters may be partially or fully identifiable. We provide a precise characterization of this phenomenon and show that the extent of recoverability is dictated by the joint PMF of the random variables. In particular, we provide necessary and sufficient conditions for exact recoverability. Furthermore, we present a polynomial time, sample efficient algorithm that recovers the exact tree when this is possible, or up to the unidentifiability as promised by our characterization, when full recoverability is impossible. Finally, we demonstrate the efficacy of our algorithm experimentally.
翻译:我们研究的是,当观测结果被一个价值为3美元或3美元以上的对称噪声频道损坏时,在共同支持下,在离树结构的Markov随机变量上学习树结构的Markov随机字段(MRF)的问题。对于Ising模型(支持大小=2),过去的工作表明,图形结构只能恢复到叶子群(叶节、其父和兄弟姐妹形成叶子群)和准确的恢复是不可能的。以前没有一项工作处理3个或3个以上的支持大小的设置,而且这一设置确实要更丰富得多。正如我们所显示的那样,当支持大小为3个或3个以上时,叶子群的结构可能是部分或完全可以识别的。我们提供了这一现象的精确特征,并表明可恢复的程度取决于随机变量的联合 PMF。特别是,我们为准确的可恢复性提供了必要和充分的条件。此外,我们提出了一个多元时间,样本有效算法,在可能时可以恢复确切的树木,或者达到我们所承诺的不可辨认性,在完全可恢复性时,我们展示了我们的实验性算法的有效性。