BEDS-BEDS-Bench:根据分配轮班-A基准对EHR模型的行为 (BEDS-Bench: Behavior of EHR-models under Distributional Shift--A Benchmark)

Machine learning has recently demonstrated impressive progress in predictive accuracy across a wide array of tasks. Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND). However, real world applications and deployments of ML rarely enjoy the comfort of encountering examples that are always IND. In such situations, most ML models commonly display erratic behavior on Out-of-Distribution (OOD) examples, such as assigning high confidence to wrong predictions, or vice-versa. Implications of such unusual model behavior are further exacerbated in the healthcare setting, where patient health can potentially be put at risk. It is crucial to study the behavior and robustness properties of models under distributional shift, understand common failure modes, and take mitigation steps before the model is deployed. Having a benchmark that shines light upon these aspects of a model is a first and necessary step in addressing the issue. Recent work and interest in increasing model robustness in OOD settings have focused more on image modality, while the Electronic Health Record (EHR) modality is still largely under-explored. We aim to bridge this gap by releasing BEDS-Bench, a benchmark for quantifying the behavior of ML models over EHR data under OOD settings. We use two open access, de-identified EHR datasets to construct several OOD data settings to run tests on, and measure relevant metrics that characterize crucial aspects of a model's OOD behavior. We evaluate several learning algorithms under BEDS-Bench and find that all of them show poor generalization performance under distributional shift in general. Our results highlight the need and the potential to improve robustness of EHR models under distributional shift, and BEDS-Bench provides one way to measure progress towards that goal.

翻译：最近,大多数ML模型在广泛任务中的预测准确性方面取得了令人印象深刻的进展。多数ML方法侧重于对与培训数据相似的(分布式或IND)的隐蔽数据的概括性表现。然而,真实世界应用和部署ML很少享受到总是 IND 的发现实例的舒适。在这种情况下,大多数ML模型通常显示在分配外(OOD)方面的行为变化不定,例如高度信任错误预测,或相反。在保健环境中,这种不寻常的模型行为的影响进一步加剧,病人健康有可能受到威胁。研究分布式转换中的模型的行为和稳健性特性至关重要,了解共同失败模式,并在模型部署之前采取缓解步骤。在模型的这些方面亮亮亮光是解决这一问题的第一步和必要步骤。最近的工作和兴趣在ODA模型中加强模型的稳健性,我们所有健康记录(EHR)模式的影响仍然严重不足。我们的目标是,在ODRDA模型下,通过在ODA标准下,将ODA标准中的关键性表现和ODA标准下,在ODA标准下,在ODA标准下,我们一般数据标准标准标准下,在ODDDDA标准下,在ODA标准标准下,需要一种标准下,在ODADDDDDD标准下,一个标准下,一个标准下,一个标准标准下,在标准下,一个标准下,一个基本数据标准标准标准标准标准,一个标准,一个标准,一个标准,一个标准,一个标准在标准,一个标准,一个标准,在ODRDBEBEBDDDDSDDDDDBDSDADBBDBDSDSDBDBDBDBDDBD标准下,一个标准下,在两个标准下,一个标准下,一个标准下,一个标准下,在两个标准下,一个标准下,一个标准下,一个标准下,一个标准下,一个标准标准,它显示它显示其基准下,一个标准。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/