Clinical machine learning models show a significant performance drop when tested in settings not seen during training. Domain generalisation models promise to alleviate this problem, however, there is still scepticism about whether they improve over traditional training. In this work, we take a principled approach to identifying Out of Distribution (OoD) environments, motivated by the problem of cross-hospital generalization in critical care. We propose model-based and heuristic approaches to identify OoD environments and systematically compare models with different levels of held-out information. We find that access to OoD data does not translate to increased performance, pointing to inherent limitations in defining potential OoD environments potentially due to data harmonisation and sampling. Echoing similar results with other popular clinical benchmarks in the literature, new approaches are required to evaluate robust models on health records.
翻译:临床机床学习模型显示,在培训期间没有看到的环境进行测试时,业绩显著下降; 区域通用模型有望缓解这一问题,但是,对于它们是否比传统培训有所改进仍然存有怀疑; 在这项工作中,我们采取原则性办法,在关键护理的跨医院普遍化问题推动下,查明分配环境之外的环境; 我们提出基于模型和休眠的方法,以查明OOOD环境,并系统地比较与不同水平的搁置信息有关的模型; 我们发现,获取OOOD数据并不能转化为业绩的提高,指出在界定潜在的OOD环境方面可能由于数据统一和取样而存在内在的局限性; 将类似结果与文献中其他流行的临床基准相呼应,需要采用新的方法来评价健全的健康记录模型。