Directed acyclic graphs represent the dependence structure among variables. When learning these graphs from data, different amounts of information may be available for different edges. Although many methods have been developed to learn the topology of these graphs, most of them do not provide a measure of uncertainty in the inference. We propose a Bayesian method, baycn (BAYesian Causal Network), to estimate the posterior probability of three states for each edge: present with one direction ($X \rightarrow Y$), present with the opposite direction ($X \leftarrow Y$), and absent. Unlike existing Bayesian methods, our method requires that the prior probabilities of these states be specified, and therefore provides a benchmark for interpreting the posterior probabilities. We develop a fast Metropolis-Hastings Markov chain Monte Carlo algorithm for the inference. Our algorithm takes as input the edges of a candidate graph, which may be the output of another graph inference method and may contain false edges. In simulation studies our method achieves high accuracy with small variation across different scenarios and is comparable or better than existing Bayesian methods. We apply baycn to genomic data to distinguish the direct and indirect targets of genetic variants.
翻译:暂无翻译