Digital twins hold substantial promise in many applications, but rigorous procedures for assessing their accuracy are essential for their widespread deployment in safety-critical settings. By formulating this task within the framework of causal inference, we show it is not possible to certify that a twin is "correct" using real-world observational data unless potentially tenuous assumptions are made about the data-generating process. To avoid these assumptions, we propose an assessment strategy that instead aims to find cases where the twin is not correct, and present a general-purpose statistical procedure for doing so that may be used across a wide variety of applications and twin models. Our approach yields reliable and actionable information about the twin under only the assumption of an i.i.d. dataset of real-world observations, and in particular remains sound even in the presence of arbitrary unmeasured confounding. We demonstrate the effectiveness of our methodology via a large-scale case study involving sepsis modelling within the Pulse Physiology Engine, which we assess using the MIMIC-III dataset of ICU patients.
翻译:数字双胞胎在许多应用中有着巨大的希望,但严格的评估其准确性的程序对于安全临界环境中的广泛部署至关重要。通过在因果推断框架内制定这一任务,我们表明不可能证明双胞胎使用真实世界的观测数据是“正确”的,除非对数据生成过程作出潜在的虚弱假设。为了避免这些假设,我们提议了一项评估战略,目的是寻找双胞胎不正确的案例,并提供一个通用的统计程序,以便用于广泛的各种应用和双胞胎模型。我们的方法产生关于双胞胎的可靠和可操作的信息,但前提是假设只有真实世界观测的i.i.d.d.数据集,特别是即使存在任意的、无法测量的混杂现象,我们通过在Pulse Physilogy 引擎内进行涉及Sepsis模型的大规模案例研究来证明我们的方法的有效性,我们利用IMIC-III的ICU病人数据集进行评估。