Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subject to exogenous "shocks" which go on to perturb an otherwise healthy system and induce disease. In other words, the variables are associated with the exogenous errors of a structural equation model (SEM), and these errors predict a downstream diagnostic label. We quantify predictivity using sample-specific Shapley values. This derivation allows us to develop a fast algorithm called Root Causal Inference for identifying patient-specific root causes by extracting the error terms of a linear SEM and then computing the Shapley value associated with each error. Experiments highlight considerable improvements in accuracy because the method uncovers root causes that may have large effect sizes at the individual level but clinically insignificant effect sizes at the group level. An R implementation is available at github.com/ericstrobl/RCI.
翻译:因此,将所有病人与所有健康控制方法进行比较的假设测试可以发现许多重要变量,其影响大小无关紧要。一些高度预测性的根源也可能在每个病人中产生疾病。在本文件中,我们将患者特有的根源定义为受外来“冲击”影响的变量,这些“冲击”会干扰一个本来健康的系统并诱发疾病。换句话说,变量与结构方程模型(SEM)的外源错误有关,这些错误预测下游诊断标签。我们用特定样本的损耗值量化预测性。这种推断使我们能够开发一种快速算法,称为“根构造推断法”,通过提取线性EMEM的错误术语来识别特定病人的根源,然后计算与每个错误相关的“损耗值”。 实验突出了准确性方面的显著改进,因为这种方法发现的根源在个体层面可能具有很大的影响大小,但在临床上影响不大。在 Guthub. com/ ericstrob/RCI 一级可以使用R 执行。