We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by unknown noise. As the presence of noise in the observations obfuscates the original tree structure, the extent of recoverability of the tree-structured MRFs under noisy observations is brought into question. We show that in a general noise model, the underlying tree structure can be recovered only up to an equivalence class where each of the leaf nodes is indistinguishable from its parent and siblings, forming a leaf cluster. As the indistinguishability arises due to contrived noise models, we study the natural k-ary symmetric channel noise model where the value of each node is changed to a uniform value in the support with an unequal and unknown probability. Here, the answer becomes much more nuanced. We show that with a support size of 2, and the binary symmetric channel noise model, the leaf clusters remain indistinguishable. From support size 3 and up, the recoverability of a leaf cluster is dictated by the joint probability mass function of the nodes within it. We provide a precise characterization of recoverability by deriving a necessary and sufficient condition for the recoverability of a leaf cluster. We provide an algorithm that recovers the tree if this condition is satisfied, and recovers the tree up to the leaf clusters failing this condition.
翻译:当观测结果被未知噪音破坏时,我们研究在离树结构的Markov随机字段(MRF)上学习离树结构的随机变量(MRF)的问题,当观测结果被不明噪音破坏时,我们共同支持这些随机变量。随着观测发现噪音的噪音模糊了原始树结构,在噪音观测中树结构的MRF的可恢复程度受到质疑。在一般噪音模型中,根植树结构只能恢复到等值类,即每个叶结点与母和兄弟姐妹无法区分,形成一个叶团。随着不可分性因混杂的噪音模型而产生,我们研究自然的K-ary对称频道噪音模型,其中每个节点的价值在支持的概率上变化成一个统一值,其可能性是不平等的和未知的。在这里,答案变得更加细微。我们显示,如果支持大小为2,二对称通道的噪音模型,叶组仍然是不可分辨的。从支持的大小和上层组群落组的可恢复性,那么叶组组的可恢复性组的可恢复性就由无法精确的树组复原性来决定。