These two synthetic datasets comprise vital signs, laboratory test results, administered fluid boluses and vasopressors for 3,910 patients with acute hypotension and for 2,164 patients with sepsis in the Intensive Care Unit (ICU). The patient cohorts were built using previously published inclusion and exclusion criteria and the data were created using Generative Adversarial Networks (GANs) and the MIMIC-III Clinical Database. The risk of identity disclosure associated with the release of these data was estimated to be very low (0.045%). The datasets were generated and published as part of the Health Gym, a project aiming to publicly distribute synthetic longitudinal health data for developing machine learning algorithms (with a particular focus on offline reinforcement learning) and for educational purposes.
翻译:这两套合成数据集包括生命迹象、实验室测试结果、对3 910名急性低血压病人和2 164名重症护理股败血病人的液泡和血管压抑器,这些病人组群是使用先前公布的包容和排斥标准建造的,数据是使用基因反转网络和MIMIC-III临床数据库制作的,与公布这些数据有关的身份披露风险估计非常低(0.045%),数据集是作为健康健身的一部分制作和出版的,该项目旨在公开分发合成长纵向健康数据,用于开发机器学习算法(特别注重离线强化学习)和教育目的。