Discovering causal effects is at the core of scientific investigation but remains challenging when only observational data is available. In practice, causal networks are difficult to learn and interpret, and limited to relatively small datasets. We report a more reliable and scalable causal discovery method (iMIIC), based on a general mutual information supremum principle, which greatly improves the precision of inferred causal relations while distinguishing genuine causes from putative and latent causal effects. We showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast cancer patients from the US Surveillance, Epidemiology, and End Results program. More than 90\% of predicted causal effects appear correct, while the remaining unexpected direct and indirect causal effects can be interpreted in terms of diagnostic procedures, therapeutic timing, patient preference or socio-economic disparity. iMIIC's unique capabilities open up new avenues to discover reliable and interpretable causal networks across a range of research fields.
翻译:发现因果关系是科学调查的核心,但在只有观察数据时仍然具有挑战性。 实际上,因果网络很难学习和解释,而且仅限于相对较小的数据集。 我们报告了一个更可靠和可扩展的因果发现方法(iMIIC ), 其依据是一般的相互信息最高原则,该方法大大提高了推断因果关系的精确性,同时区分了真实原因与推定和潜在因果影响。 我们从396 179名来自美国监测、流行病学和 " 最终结果 " 方案的乳腺癌患者的合成和实际健康数据中展示了iMIIC。 超过90个预测因果效应似乎是正确的,而其余的意外直接和间接因果效应可以被解释为诊断程序、治疗时机、患者偏好或社会经济差异。 iMIIC的独特能力为发现一系列研究领域的可靠和可解释的因果网络开辟了新的途径。</s>